Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add Support for IP Data Type #3145

Open
currantw opened this issue Oct 31, 2024 · 7 comments
Open

[FEATURE] Add Support for IP Data Type #3145

currantw opened this issue Oct 31, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@currantw
Copy link

currantw commented Oct 31, 2024

Is your feature request related to a problem?

OpenSearch SQL plugin does not support the IP address field type.

🚫 IP address fields are converted to strings:

search=weblog | fields host

returns host with field type string.

🚫 IP address fields cannot be correctly for equality:

search=weblog | where host = "2001:0db7::ff00:42:8329" | fields host

will not return the value 2001:0db7:0000:0000:0000:ff00:0042:8329, even though both strings represent the same IP address.

What solution would you like?

Outcomes:

  • IP addresses can be retrieved using the OpenSearch SQL plugin without conversion to strings.
  • IP addresses supports equality operations (= and !=).
  • IP addresses supports comparison operations (<, <=, >, and >) if they are both IPv4 or IPv6.
  • IP addresses supports sorting (again, if they are all IPv4 or IPv6).
  • IP addresses work with IP-specific functions (currently only cidrmatch - see Add CIDR function to PPL (#3036) #3110).

Proposed Solution:

  • Add an new IP type to ExprCoreType.
  • Replace OpenSearchExprIpValue with ExprIpValue , and update implementation.
  • Update OpenSearchDataType.MappingType to map "ip" fields to ExprCoreType.IP.
  • Update OpenSearchExprValueFactory.
  • Update other code, unit tests, and integration tests as necessary.

What alternatives have you considered?

None

Do you have any additional context?

This is closely related to #3110. This issue added a new cidrmatch(ip, cidr) function that returns whether the given IP address is within the specified CIDR IP address range. As part of this work, the SQL plugin was updated to cast IP addresses to strings - previously, it would raise an exception.

@currantw currantw added enhancement New feature or request untriaged labels Oct 31, 2024
@kedbirhan
Copy link

we really need this feature, we can't even do basic IP lookups right now!

query

SELECT * 
FROM logs-vpc
WHERE dstaddr = "10.100.138.82"
LIMIT 10;

log

400 Bad Request: "{<EOL>  "error": {<EOL>    "reason": "Invalid SQL query",<EOL>    "details": "= function expected {[BYTE,BYTE],[SHORT,SHORT],[INTEGER,INTEGER],[LONG,LONG],[FLOAT,FLOAT],[DOUBLE,DOUBLE],[STRING,STRING],[BOOLEAN,BOOLEAN],[DATE,DATE],[TIME,TIME],[DATETIME,DATETIME],[TIMESTAMP,TIMESTAMP],[INTERVAL,INTERVAL],[STRUCT,STRUCT],[ARRAY,ARRAY]}, but get [IP,STRING]",<EOL>    "type": "ExpressionEvaluationException"<EOL>  },<EOL>  "status": 400<EOL>}"

@YANG-DB
Copy link
Member

YANG-DB commented Nov 15, 2024

Is your feature request related to a problem?

OpenSearch SQL plugin does not support the IP address field type.

🚫 IP address fields are converted to strings:

search=weblog | fields host

returns host with field type string.

🚫 IP address fields cannot be correctly for equality:

search=weblog | where host = "2001:0db7::ff00:42:8329" | fields host

will not return the value 2001:0db7:0000:0000:0000:ff00:0042:8329, even though both strings represent the same IP address.

What solution would you like?

Outcomes:

  • IP addresses can be retrieved using the OpenSearch SQL plugin without conversion to strings.
  • IP addresses supports equality operations (= and !=).
  • IP addresses supports comparison operations (<, <=, >, and >) if they are both IPv4 or IPv6.
  • IP addresses supports sorting (again, if they are all IPv4 or IPv6).
  • IP addresses work with IP-specific functions (currently only cidrmatch - see Add CIDR function to PPL (#3036) #3110).

Proposed Solution:

  • Add an new IP type to ExprCoreType.
  • Replace OpenSearchExprIpValue with ExprIpValue , and update implementation.
  • Update OpenSearchDataType.MappingType to map "ip" fields to ExprCoreType.IP.
  • Update OpenSearchExprValueFactory.
  • Update other code, unit tests, and integration tests as necessary.

What alternatives have you considered?

None

Do you have any additional context?

This is closely related to #3110. This issue added a new cidrmatch(ip, cidr) function that returns whether the given IP address is within the specified CIDR IP address range. As part of this work, the SQL plugin was updated to cast IP addresses to strings - previously, it would raise an exception.

Hi @currantw - this is a great idea
my only concern is how to differentiate between a keyword field and an ip field - it can infer this implicitly but its not always the case that a user knows this in advanced ...

Can we explicitly use an ip syntax to have both the query writer and the query parser engine be aware this field is expected to be an ip and not text/keyword ?

This should be applicable for any <, <=, >, and > predicate operators

search=weblog | where host = ip("2001:0db7::ff00:42:8329") | fields host

@andrross
Copy link
Member

[Catch All Triage - 1, 2, 3, 4, 5]

currantw added a commit to Bit-Quill/opensearch-project-sql that referenced this issue Nov 22, 2024
…ort for IP address type).

Signed-off-by: currantw <[email protected]>
@currantw
Copy link
Author

currantw commented Nov 22, 2024

PROPOSED SOLUTION

  • Add an new IP type to ExprCoreType with compatible type STRING.
  • Replace OpenSearchExprIpValue with ExprIpValue.
  • Store the IP address internally as an IPAddressString.
  • Implement equal to support comparison with an IP address or a string.
  • Implement compare to support comparison with an IP address or a string of the same type (i.e. IPv4 or IPv6).
  • Implement stringValue and toString to return the original string.
  • Replace OpenSearchExprIpValueTest with ExprIpValueTest and update tests accordingly.
  • Update OpenSearchDataType.MappingType to map ip fields to ExprCoreType.IP instead of ExprCoreType.UNKNOWN.
  • Update OpenSearchExprValueFactory and corresponding tests.
  • Add new method ip that takes an ExprStringValue and returns an ExprIpValue.
  • Update cidrmatch function to support both IP address and string data types (for the first argument). Also update corresponding unit and integration tests.
  • Update other code, unit tests, and integration tests as necessary.

Additional notes:

  • For convenience, I think it makes sense to allow comparisons between IP addresses and strings, without needing to use the ip function. It's not strictly necessary, but it's easy to implement and probably more user friendly?
  • Since the cidrmatch function isn't merged yet, and the ip function is also relatively separate from the rest of the work, I may implement this as 2 or 3 successive PRs.

@normanj-bitquill
Copy link
Contributor

normanj-bitquill commented Nov 22, 2024

@currantw I think that comparisons should support both IPv4 and IPv6 addresses. For example an IPv4 address could be compared to an IPv6 address.

To implement this, I think it makes sense to have all IPv6 addresses be greater than all IPv4 addresses.

Consider an index that has an IP field. The field contains both IPv4 and IPv6 addresses. We should be able to sort based on that column. Sorting needs to be able to compare values.

For IPv6, toString and stringValue do not necessarily need to return the original string. The address 12ab:0000:0000:: is the same as the address 12ab::. It should be acceptable for toString or stringValue to return a reduced but equivalent version of the original IP address. At the very least be sure not to return a longer but equivalent IP address.

@normanj-bitquill
Copy link
Contributor

Another thing to consider is IPv4 address in IPv6 representation.
https://en.wikipedia.org/wiki/Reserved_IP_addresses#IPv6

This could mean that some IPv6 addresses are equal to some IPv4 addresses, such as:

::ffff:c0a8:2:3 = 192.168.2.3

@currantw
Copy link
Author

Thanks @normanj-bitquill for the comments and suggestions. I think it makes sense to try (to the extent possible) to model the SQL plugin's IP address data type behaviour on the existing core OpenSearch behaviour.

  • It does not supports any leading zeros for IPv4 addresses (e.g. 101.0.0.02), and simply fails. It should be relatively straight-forward to support this in the SQL plugin (and convert it to the canonical representation, e.g. 101.0.0.2), and it seems fine to me to be more permissive.
  • IPv4-mapped IPv6 addresses are automatically converted to IPv4 addresses (e.g. ::FFFF:1.2.3.4 is converted to 1.2.3.4).
  • Comparisons (including equality) are based on InetAddressPoint, which converts IPv4 addresses to IPv4-mapped IPv6 addresses, and does a bit-wise comparison. For example, this means that ::FFFF:1.2.3.4,1.2.3.4, and ::FFFF:0102:0304 are all equivalent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants