Support for IPv6 mapping type #3714

bodgit · 2013-09-17T10:43:10Z

Currently I can't use the ip mapping type as I have fields that can be either IPv4 or IPv6. However, being able to use range queries is really useful but I can't make use of them because I have to treat the field as a string to handle the case when the field contains an IPv6 value.

Obviously this causes extra hassle as storage would then require 128 bits and when searching, range queries using IPv6 addresses shouldn't match IPv4 addresses, unless you're using the ::ffff:d.d.d.d notation, and IPv6 addresses shouldn't match IPv4 range queries at all.

(I found this thread when this has been raised previously)

abh · 2013-11-26T00:12:48Z

I'd like to convert a postgresql based application to use ES but got hung up on missing this feature, too. The queries are using netmasks/cidrs so just having the IPv6 address as a string won't be "good enough".

dadoonet · 2013-11-26T05:55:06Z

For IP V6, just mark your field as not_analyzed in mapping.

abh · 2013-11-26T20:28:51Z

@dadoonet That doesn't make any sense.

dadoonet · 2013-11-26T21:19:20Z

Do you mean that you don't understand my answer or my answer does not answer to your question?

abh · 2013-11-26T21:32:05Z

@dadoonet What's the point of the "ip type" if a reasonable answer to supporting IPv6 is "just make it a not analyzed string"? They're not the same thing, I'd hope.

dadoonet · 2013-11-26T22:34:18Z

IP type is only for IP v4. Type name should be ipv4 instead of ip.
For ipv6 I don't think a special type is needed. Keeping ipv6 as non tokenized string should do the job.

Hhow do you expect ipv6 content to be converted to?

abh · 2013-11-26T22:38:57Z

It could be converted to a number, for instance, and then allow range searches etc similar to the "ipv4" type.

Better yet the "ip type" should "just work" for both (similar to what postgresql does, for example).

bodgit · 2013-11-26T22:50:58Z

There are ways of expressing IPv6 addresses that would likely fail a simple string-based match, the whole '::' expansion for one.

dadoonet · 2013-11-27T09:17:00Z

@bodgit Very good point! I'm going to think about it a bit more.

lifo101 · 2013-12-17T13:46:11Z

I have an app that stores iPv4/6 addresses as DECIMAL(39,0) in a mysql database which allows for very easy range searching. I wait for the day when ES will support something similar for IPv6 so I can finally use ES for indexing my database.

jvbrandis · 2014-01-17T17:44:58Z

When storing IPv6 addresses, I store it as a "fully formatted" IPv6-string, i.e. XXXX:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX, regardless of zeros (so, never shortening a segment to less than four digits, and never shortcutting segments with ::).
This way, all IPv6 addresses are fully sortable and searchable, so this should work with ES (using not-analyzed mapping). However, it is very space-consuming when comparing to what an IPv6 address really is, which is 16 bytes (while this becomes 40 bytes...)
Also, if putting IPv4-addresses into this mix, sorting/filtering on a range will lead to problems mixing IPv4 and IPv6.
(This could be solved by using the IPv4-mapping format of IPv6, that is all IPv4 addresses are stored as IPv6 as ::FFFF:XXXX:XXXX (last four bytes being the IPv4 address)

The other approach is to store as a binary field, using 16 bytes, (still storing IPv4-addresses in IPv4-mapped IPv6-format). My approach to this in mysql is actually a BINARY(16) column.
However, this is inconvenient as manually browsing/inspecting the data becomes cumbersome.

So; I am also eagerly awaiting ES support for IPv6, storing IPv6-addresses in numeric format, but with support for properly displaying them and accepting query parameters in IP-format.

cpdean · 2014-03-28T18:35:15Z

+1. would like this feature

ioc32 · 2014-03-29T17:03:47Z

As @abh pointed out the fact ES is not currently supporting both protocols equally is a show stopper for many applications - to be ported or to be implemented from scratch.

ES is pretty much becoming a de facto standard when it comes to scalable event storage, search and analysis. In my particular use case, and I do not think I am the only one here, I deal with IPv4 just as much as with IPv6. Having both address families under a single, coherent data type is to be desired. Mappings, queries, indexing... would become unified and consequently easier to use for everyone.

@dadoonet I wonder what's the reason for ES to support IPv4-only data types in first place. Was it a technical decision due to implementation difficulties, was it a matter of priorities? Or, on the other hand, was it a consequence of you guys perceiving ES users did not care about IPv6? Is it at least in your roadmap?

dadoonet · 2014-03-29T18:06:39Z

@ioc32 It is on my TODO list for sure! I need to find some quiet time to work on it.

ioc32 · 2014-03-29T18:09:53Z

@dadoonet great! Thank you for updating us!

kimchy · 2014-03-29T20:19:26Z

the reason is simple, ipv4 can easily be translated to 64bit long, which supports range constructs, ipv6 is more complex.

cpdean · 2014-03-30T22:21:17Z

definitely looking forward to this. It'll really round out the ELK stack for feature complete network analysis. thanks!

kimchy · 2014-03-31T01:51:28Z

understood, though for now, if you can get around with prefix checks, you can map the IP as string.

zachfi · 2014-04-01T02:56:21Z

+1 to defending @dadoonet's quiet time. I'd love to see this happen.

Dunaeth · 2014-04-03T05:07:50Z

Wouldn't it be possible to use fixed length lucene binary field types for ips and use binary sorting (I read about binary utf8 sorting in lucene, but I lack somme skills on the subject) ?

jpountz · 2014-04-03T06:26:47Z

It is indeed possible to encode ipv6 ips as binary fields, Lucene doesn't require index terms to be UTF-8 sequences, it can be anything. The challenge here is more that for IPs, we need to support efficient ranges because that's typically how these fields are filtered. Lucene provides support for efficient ranges with numeric fields (see NumericRangeQuery): basically every field gets indexed with different precision levels, and this allows range queries to visit few terms no matter how large the range is (the fewer terms are visited the more efficient queries are). So we would need a similar mechanism for storing ipv6 addresses.

seti123 · 2014-04-03T21:04:04Z

+1

clintongormley · 2014-07-11T09:41:26Z

Depends on https://issues.apache.org/jira/browse/LUCENE-5596

avleen · 2015-07-14T21:52:34Z

I see we're still blocked on Lucene's support for BigInt for this. But that ticket hasn't seen any action in a while either.
Any updates for this @clintongormley? IPv6 is becoming a real thing, so this would be really handy :-)

zachfi · 2015-07-15T19:35:35Z

Its a 14 year old protocol. We're well beyond 'real thing' :)

jpountz · 2015-07-15T21:57:17Z

The Lucene issue is stalled indeed, as it proved very hard to integrate... The feature is currently exposed as an experimental postings format which is not supported in terms of backward compatibility.

With small numbers (up to 64 bits) today we have static pre-computed ranges, which is probably fine. For instance for ints (32 bits) we have a default precision step of 8 bits which means that we pre-compute ranges for all numbers that have the same 24, 16 or 8 upper bits (0-256, 256-512, 512-768, ..., 0-65536, 65536-131072, 131072-196608, ..., 0-16777216, 16777216-33554432, 33554432-50331648, ...). Any arbitrary range can be translated to a union of these pre-computed ranges, and this is the way we manage to have fast ranges on numerics.

With high numbers of bits, like 128 here, the space-time trade-off becomes tricky I think. For instance with a precision step of 16, we would have to index 8 tokens per value while range queries would still visit hundreds of thousands of terms in the worst-case.

Given that ipv6 addresses tend to use the lower bytes less, maybe that would be fine, but I'm a bit reluctant to expose a new field type for ipv6 addresses that would not perform well for range queries. An option could be to have a new type for ipv6 addresses that would only support sorting and aggs but not queries, however I'm not sure how useful it would be?

avleen · 2015-07-16T01:15:52Z

Agreed.
/64's are the smallest allocations that are generally given out, so
searching for a range may not (initially) need more precision than that.
If we see an IPv6 address, we can store the range of the /64 it is in, and
then work up from there?
/64, /32, /16, /8, /4, /2, /1, /0
That's 8 bits there, and from a practical perspective it might be
sufficient. Most end users get a /64, which makes searching in that easy.
ISPs get at least /32 sized blocks.

Most IPv6 address allocated today, when converted to decimal are about 38
bytes. That means 76 bytes (upper and lower bounds) to store each range. So
about 600 bytes of storage required for the precision, per address, in
addition to the ~38 bytes for the address itself.. That's quite a lot, but
that's really just the way it is - we can't make these numbers smaller ;-)

If we restrict range searches to at least /64, could this then work out?

On Wed, Jul 15, 2015 at 5:57 PM Adrien Grand [email protected]
wrote:

The Lucene issue is stalled indeed, as it proved very hard to integrate...
The feature is currently exposed as an experimental postings format which
is not supported in terms of backward compatibility.

With small numbers (up to 64 bits) today we have static pre-computed
ranges, which is probably fine. For instance for ints (32 bits) we have a
default precision step of 8 bits which means that we pre-compute ranges for
all numbers that have the same 24, 16 or 8 upper bits (0-256, 256-512,
512-768, ..., 0-65536, 65536-131072, 131072-196608, ..., 0-16777216,
16777216-33554432, 33554432-50331648, ...). Any arbitrary range can be
translated to a union of these pre-computed ranges, and this is the way we
manage to have fast ranges on numerics.

With high numbers of bits, like 128 here, the space-time trade-off becomes
tricky I think. For instance with a precision step of 16, we would have to
index 8 tokens per value while range queries would still visit hundreds of
thousands of terms in the worst-case.

Given that ipv6 addresses tend to use the lower bytes less, maybe that
would be fine, but I'm a bit reluctant to expose a new field type for ipv6
addresses that would not perform well for range queries. An option could be
to have a new type for ipv6 addresses that would only support sorting and
aggs but not queries, however I'm not sure how useful it would be?

—
Reply to this email directly or view it on GitHub
#3714 (comment)
.

bodgit · 2015-07-16T08:06:57Z

How would this work with a type that handles both IPv4 and IPv6? As I originally stated in my use case I don't know the address family ahead of time, only that it is "an IP address" so I would prefer a type that can handle both. If that meant storing IPv4 addresses as IPv6-mapped it means that for such addresses, you do care about the lesser significant bits more as the address is ::ffff:d.d.d.d and so the first 96 bits are always going to be the same.

avleen · 2015-09-25T02:36:41Z

FWIW, ARIN announced depletion of their free IP pool today:
http://teamarin.net/category/ipv4-depletion/

hanej · 2015-09-28T21:01:26Z

Our access logs use a combination of IPv6 and IPv4 in the same field so we're in the same situation as @bodgit

avleen · 2015-09-30T00:56:53Z

The Lucene ticket mentioned above isn't being worked on.
Instead they implemented a different way of doing things, which could enable an ipv6 type:
https://issues.apache.org/jira/browse/LUCENE-5879
But I think it might be up to Elasticsearch to implement that on top of the work they did on the auto-prefix terms?

rmuir · 2015-09-30T01:56:16Z

Thats not really true. @mikemccand and @nknize are hard at work, and have been for a long time, adding all kinds of experimental data structures to lucene: to better solve the issues of numeric-like fields, spatial data structures, etc.

Another one that is promising for cases like this is https://issues.apache.org/jira/browse/LUCENE-6697

But there is still work to do, to graduate them from the sandbox: for example (this is not criticism, these guys are iterating and that is how it goes), some of these formats create large files in /tmp during merge. This kind of "sandy" stuff has to be cleaned up before they are production-strength.

Furthermore integrating them is a little tricky, in the past everyone has jumped to build numerics/spatial on top of what lucene already had (things like inverted index structures), and currently I see them still "wedging" the new stuff behind those apis.

I think in order to fix it properly, we have to expand the index format (Codec apis) with abstractions for these kinds of data structures, simple ones we can live with, improve for users over minor releases, and support backwards compatibility for. We can't just shove this stuff out there quickly: exposing these kinds of features means we are committing ourselves to long-term backwards compatibility of the format, that is one reason it takes longer.

I am not really following all that closely, nobody can keep up with those guys, so I might be wrong, but this is just my high level view on the thing. Its not that we are lazy and don't care about IPv6 or anything like that.

avleen · 2015-09-30T17:15:03Z

Robert, I don't for a moment think you, or anyone working on ES or Lucene
is lazy.
You folks all do incredible work and give it to us for free. We're very
grateful for you efforts.

I think ipv6 is just a big deal to a lot of people, which is why we see so
much interest in this issue, and we're just waiting for the technology to
catch up to our needs :)

On Tue, Sep 29, 2015, 21:56 Robert Muir [email protected] wrote:

Thats not really true. @mikemccand https://github.com/mikemccand and
@nknize https://github.com/nknize are hard at work, and have been for a
long time, adding all kinds of experimental data structures to lucene: to
better solve the issues of numeric-like fields, spatial data structures,
etc.

Another one that is promising for cases like this is
https://issues.apache.org/jira/browse/LUCENE-6697

But there is still work to do, to graduate them from the sandbox: for
example (this is not criticism, these guys are iterating and that is how it
goes), some of these formats create large files in /tmp during merge. This
kind of "sandy" stuff has to be cleaned up before they are
production-strength.

Furthermore integrating them is a little tricky, in the past everyone has
jumped to build numerics/spatial on top of what lucene already had (things
like inverted index structures), and currently I see them still "wedging"
the new stuff behind those apis.

I think in order to fix it properly, we have to expand the index format
(Codec apis) with abstractions for these kinds of data structures, simple
ones we can live with, improve for users over minor releases, and support
backwards compatibility for. We can't just shove this stuff out there
quickly: exposing these kinds of features means we are committing ourselves
to long-term backwards compatibility of the format, that is one reason it
takes longer.

I am not really following all that closely, nobody can keep up with those
guys, so I might be wrong, but this is just my high level view on the
thing. Its not that we are lazy and don't care about IPv6 or anything like
that.

—
Reply to this email directly or view it on GitHub
#3714 (comment)
.

kjelle · 2016-02-04T11:51:42Z

+1

kkirsche · 2016-02-24T20:48:08Z

+1 This would be extremely helpful

nknize · 2016-02-24T20:54:03Z

You're in luck. Thx to @rmuir this is getting closer. https://issues.apache.org/jira/browse/LUCENE-7043

damm · 2016-03-28T05:33:41Z

Yes it should be closer; I hope ES 5? :)

jpountz · 2016-04-17T11:06:05Z

Fixed via #17746

zachfi · 2016-04-17T17:51:59Z

Thank you for the effort(s).

kkirsche · 2016-04-17T19:06:31Z

Thank you for the work on this!

bodgit · 2016-04-18T09:01:52Z

🍰 🎉 👍

bananabr · 2016-09-26T17:26:18Z

+1

ghost assigned dadoonet Nov 27, 2013

kzwang mentioned this issue Apr 10, 2014

add ipv6 field support #5758

Closed

clintongormley added the stalled label Jul 11, 2014

regit mentioned this issue Aug 1, 2014

Mapping all IP fields to type "ip" StamusNetworks/SELKS#6

Open

dadoonet removed their assignment Oct 3, 2014

clintongormley added >feature high hanging fruit :Search Foundations/Mapping Index mappings, including merging and defining field types labels Sep 21, 2015

pierky mentioned this issue Dec 29, 2015

define dynamic template of ip_* pierky/pmacct-to-elasticsearch#4

Closed

megastef mentioned this issue Jan 27, 2016

ignore_malformed to support ignoring JSON objects ingested into fields of the wrong type #12366

Open

jpountz closed this as completed Apr 17, 2016

javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024

Support for IPv6 mapping type #3714

Support for IPv6 mapping type #3714

Comments

bodgit commented Sep 17, 2013

abh commented Nov 26, 2013

dadoonet commented Nov 26, 2013

abh commented Nov 26, 2013

dadoonet commented Nov 26, 2013

abh commented Nov 26, 2013

dadoonet commented Nov 26, 2013

abh commented Nov 26, 2013

bodgit commented Nov 26, 2013

dadoonet commented Nov 27, 2013

lifo101 commented Dec 17, 2013

jvbrandis commented Jan 17, 2014

cpdean commented Mar 28, 2014

ioc32 commented Mar 29, 2014

dadoonet commented Mar 29, 2014

ioc32 commented Mar 29, 2014

kimchy commented Mar 29, 2014

cpdean commented Mar 30, 2014

kimchy commented Mar 31, 2014

zachfi commented Apr 1, 2014

Dunaeth commented Apr 3, 2014

jpountz commented Apr 3, 2014

seti123 commented Apr 3, 2014

clintongormley commented Jul 11, 2014

avleen commented Jul 14, 2015

zachfi commented Jul 15, 2015

jpountz commented Jul 15, 2015

avleen commented Jul 16, 2015

bodgit commented Jul 16, 2015

avleen commented Sep 25, 2015

hanej commented Sep 28, 2015

avleen commented Sep 30, 2015

rmuir commented Sep 30, 2015

avleen commented Sep 30, 2015

kjelle commented Feb 4, 2016

kkirsche commented Feb 24, 2016

nknize commented Feb 24, 2016

damm commented Mar 28, 2016

jpountz commented Apr 17, 2016

zachfi commented Apr 17, 2016

kkirsche commented Apr 17, 2016

bodgit commented Apr 18, 2016

bananabr commented Sep 26, 2016