-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for IPv6 mapping type #3714
Comments
I'd like to convert a postgresql based application to use ES but got hung up on missing this feature, too. The queries are using netmasks/cidrs so just having the IPv6 address as a string won't be "good enough". |
For IP V6, just mark your field as not_analyzed in mapping. |
@dadoonet That doesn't make any sense. |
Do you mean that you don't understand my answer or my answer does not answer to your question? |
@dadoonet What's the point of the "ip type" if a reasonable answer to supporting IPv6 is "just make it a not analyzed string"? They're not the same thing, I'd hope. |
IP type is only for IP v4. Type name should be ipv4 instead of ip. Hhow do you expect ipv6 content to be converted to? |
It could be converted to a number, for instance, and then allow range searches etc similar to the "ipv4" type. Better yet the "ip type" should "just work" for both (similar to what postgresql does, for example). |
There are ways of expressing IPv6 addresses that would likely fail a simple string-based match, the whole '::' expansion for one. |
@bodgit Very good point! I'm going to think about it a bit more. |
I have an app that stores iPv4/6 addresses as DECIMAL(39,0) in a mysql database which allows for very easy range searching. I wait for the day when ES will support something similar for IPv6 so I can finally use ES for indexing my database. |
When storing IPv6 addresses, I store it as a "fully formatted" IPv6-string, i.e. XXXX:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX, regardless of zeros (so, never shortening a segment to less than four digits, and never shortcutting segments with ::). The other approach is to store as a binary field, using 16 bytes, (still storing IPv4-addresses in IPv4-mapped IPv6-format). My approach to this in mysql is actually a BINARY(16) column. So; I am also eagerly awaiting ES support for IPv6, storing IPv6-addresses in numeric format, but with support for properly displaying them and accepting query parameters in IP-format. |
+1. would like this feature |
As @abh pointed out the fact ES is not currently supporting both protocols equally is a show stopper for many applications - to be ported or to be implemented from scratch. ES is pretty much becoming a de facto standard when it comes to scalable event storage, search and analysis. In my particular use case, and I do not think I am the only one here, I deal with IPv4 just as much as with IPv6. Having both address families under a single, coherent data type is to be desired. Mappings, queries, indexing... would become unified and consequently easier to use for everyone. @dadoonet I wonder what's the reason for ES to support IPv4-only data types in first place. Was it a technical decision due to implementation difficulties, was it a matter of priorities? Or, on the other hand, was it a consequence of you guys perceiving ES users did not care about IPv6? Is it at least in your roadmap? |
@ioc32 It is on my TODO list for sure! I need to find some quiet time to work on it. |
@dadoonet great! Thank you for updating us! |
the reason is simple, ipv4 can easily be translated to 64bit long, which supports range constructs, ipv6 is more complex. |
definitely looking forward to this. It'll really round out the ELK stack for feature complete network analysis. thanks! |
understood, though for now, if you can get around with prefix checks, you can map the IP as string. |
+1 to defending @dadoonet's quiet time. I'd love to see this happen. |
Wouldn't it be possible to use fixed length lucene binary field types for ips and use binary sorting (I read about binary utf8 sorting in lucene, but I lack somme skills on the subject) ? |
It is indeed possible to encode ipv6 ips as binary fields, Lucene doesn't require index terms to be UTF-8 sequences, it can be anything. The challenge here is more that for IPs, we need to support efficient ranges because that's typically how these fields are filtered. Lucene provides support for efficient ranges with numeric fields (see NumericRangeQuery): basically every field gets indexed with different precision levels, and this allows range queries to visit few terms no matter how large the range is (the fewer terms are visited the more efficient queries are). So we would need a similar mechanism for storing ipv6 addresses. |
+1 |
I see we're still blocked on Lucene's support for BigInt for this. But that ticket hasn't seen any action in a while either. |
Its a 14 year old protocol. We're well beyond 'real thing' :) |
The Lucene issue is stalled indeed, as it proved very hard to integrate... The feature is currently exposed as an experimental postings format which is not supported in terms of backward compatibility. With small numbers (up to 64 bits) today we have static pre-computed ranges, which is probably fine. For instance for ints (32 bits) we have a default precision step of 8 bits which means that we pre-compute ranges for all numbers that have the same 24, 16 or 8 upper bits (0-256, 256-512, 512-768, ..., 0-65536, 65536-131072, 131072-196608, ..., 0-16777216, 16777216-33554432, 33554432-50331648, ...). Any arbitrary range can be translated to a union of these pre-computed ranges, and this is the way we manage to have fast ranges on numerics. With high numbers of bits, like 128 here, the space-time trade-off becomes tricky I think. For instance with a precision step of 16, we would have to index 8 tokens per value while range queries would still visit hundreds of thousands of terms in the worst-case. Given that ipv6 addresses tend to use the lower bytes less, maybe that would be fine, but I'm a bit reluctant to expose a new field type for ipv6 addresses that would not perform well for range queries. An option could be to have a new type for ipv6 addresses that would only support sorting and aggs but not queries, however I'm not sure how useful it would be? |
Agreed. Most IPv6 address allocated today, when converted to decimal are about 38 If we restrict range searches to at least /64, could this then work out? On Wed, Jul 15, 2015 at 5:57 PM Adrien Grand [email protected]
|
How would this work with a type that handles both IPv4 and IPv6? As I originally stated in my use case I don't know the address family ahead of time, only that it is "an IP address" so I would prefer a type that can handle both. If that meant storing IPv4 addresses as IPv6-mapped it means that for such addresses, you do care about the lesser significant bits more as the address is |
FWIW, ARIN announced depletion of their free IP pool today: |
Our access logs use a combination of IPv6 and IPv4 in the same field so we're in the same situation as @bodgit |
The Lucene ticket mentioned above isn't being worked on. |
Thats not really true. @mikemccand and @nknize are hard at work, and have been for a long time, adding all kinds of experimental data structures to lucene: to better solve the issues of numeric-like fields, spatial data structures, etc. Another one that is promising for cases like this is https://issues.apache.org/jira/browse/LUCENE-6697 But there is still work to do, to graduate them from the sandbox: for example (this is not criticism, these guys are iterating and that is how it goes), some of these formats create large files in /tmp during merge. This kind of "sandy" stuff has to be cleaned up before they are production-strength. Furthermore integrating them is a little tricky, in the past everyone has jumped to build numerics/spatial on top of what lucene already had (things like inverted index structures), and currently I see them still "wedging" the new stuff behind those apis. I think in order to fix it properly, we have to expand the index format (Codec apis) with abstractions for these kinds of data structures, simple ones we can live with, improve for users over minor releases, and support backwards compatibility for. We can't just shove this stuff out there quickly: exposing these kinds of features means we are committing ourselves to long-term backwards compatibility of the format, that is one reason it takes longer. I am not really following all that closely, nobody can keep up with those guys, so I might be wrong, but this is just my high level view on the thing. Its not that we are lazy and don't care about IPv6 or anything like that. |
Robert, I don't for a moment think you, or anyone working on ES or Lucene I think ipv6 is just a big deal to a lot of people, which is why we see so On Tue, Sep 29, 2015, 21:56 Robert Muir [email protected] wrote:
|
+1 |
+1 This would be extremely helpful |
You're in luck. Thx to @rmuir this is getting closer. https://issues.apache.org/jira/browse/LUCENE-7043 |
Yes it should be closer; I hope ES 5? :) |
Fixed via #17746 |
Thank you for the effort(s). |
Thank you for the work on this! |
🍰 🎉 👍 |
+1 |
Currently I can't use the ip mapping type as I have fields that can be either IPv4 or IPv6. However, being able to use range queries is really useful but I can't make use of them because I have to treat the field as a string to handle the case when the field contains an IPv6 value.
Obviously this causes extra hassle as storage would then require 128 bits and when searching, range queries using IPv6 addresses shouldn't match IPv4 addresses, unless you're using the ::ffff:d.d.d.d notation, and IPv6 addresses shouldn't match IPv4 range queries at all.
(I found this thread when this has been raised previously)
The text was updated successfully, but these errors were encountered: