Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]Add geoip function to PPL for IP address geolocation #3038

Open
YANG-DB opened this issue Sep 16, 2024 · 8 comments · May be fixed by #3085
Open

[FEATURE]Add geoip function to PPL for IP address geolocation #3038

YANG-DB opened this issue Sep 16, 2024 · 8 comments · May be fixed by #3085
Labels
enhancement New feature or request PPL Piped processing language

Comments

@YANG-DB
Copy link
Member

YANG-DB commented Sep 16, 2024

Description:
We propose adding an geoip function to OpenSearch's Piped Processing Language (PPL) and SQL to provide built-in IP address geolocation capabilities.
This feature would be similar to functionality used in OpenSearch's geospatial feature, enhancing PPL's ability to enrich log data with geographical information based on IP addresses.

Proposed Functionality:

  1. The 'geoip' function should take an IP address as input and return geographical information.
  2. It should support both IPv4 and IPv6 addresses.
  3. The function should return multiple fields including country, region, city, latitude, longitude, and others as available.
  4. It should allow users to specify which geolocation fields to include in the output.
  5. The function should use a regularly updated IP geolocation database for accuracy.

Example Usage:

... | eval geolocation = geoip(ip_field)

This would add a new field 'geolocation' with all available location information for the IP address in 'ip_field'.

... | eval country = geoip(ip_field, "country")
... | eval lat = geoip(ip_field, "lat"), lon = geoip(ip_field, "lon")

This would add new fields with specific geolocation information.

... | eval location_info = geoip(ip_field, "country,region,city,lat,lon")

This would add a new field 'location_info' with multiple pieces of geolocation data.

Additional considerations

  • Allow for using the geospatial opensearch plugin for the ip to geo resolving

Related resources

@dblock
Copy link
Member

dblock commented Oct 7, 2024

[Catch All Triage - 1, 2, 3, 4]

@kenrickyap
Copy link

Am in the process of implementing this

@kenrickyap kenrickyap linked a pull request Oct 21, 2024 that will close this issue
7 tasks
@kenrickyap
Copy link

kenrickyap commented Oct 24, 2024

Hi @YANG-DB ,

What was the intended method of leveraging the geospatial plugin?

Following the example of the inclusion of the job-scheduler and ml-commons plugin, I have been trying to import it directly into the project but noticed that the published geospatial plugin on maven has no jar. As such it does not seem possible to directly import the plugin. Is this assumption correct?

If so then, my current plan is to call the endpoint that the geospatial plugin exposes in OpenSearch documented here and communicate with it using the OpenSearchRestClient. Would this be a good path forward? or am I missing something that would make it possible to expose the geospatial plugin?

Thanks!

@YANG-DB YANG-DB changed the title [FEATURE]Add iplocation function to PPL for IP address geolocation [FEATURE]Add geoip function to PPL for IP address geolocation Nov 2, 2024
@andy-k-improving
Copy link
Contributor

Hi, @YANG-DB
After a few discovery and feasibility checks, we have updated our approach, the below are the high-level plan along with the proposed code changes.
Can you have a look and advise?

High-level idea:

  • Create a new ActionType with logic on Geo-Spatial plugin to expose a new action which takes a IP string and return the appropriate Geo detail.
  • Update existing SQL plugin accordingly to invoke the call on nodeClient with the newly created Geo-Spatial action for the geoip function.

Proposed code changes:

GeoSpatial:

  • Create a new TransportAction and register it accordingly:
    • Create a new TransportAction class , which is similar to GetDatasourceTransportAction and the sole purpose of this Action is to process an incoming IP String, with the given provider, then return the appropriate geoSpatial detail fields.
    • Update GeospatialPlugin.getAction( ) class to register the newly created action.
  • Create a new sub-module with name geo-spatial-client which has the nodeClientWrapper as the wrapper for the cross-plugin interaction interface, a few ActionType along with appropriate wrapper object to form the API signature for the return type.
  • Update Gradle script to publish geo-spatial-client module as a separate jar.

SQL module:

  • Update Gradle setting to import geo-spatial-client into OpenSearch sub-module.
  • Override the existing EvalOpeartor processing logic:
    • Create a new OpenSearchEvalOperator class which extends from the existing EvalOperator with an additional class property NodeClient.
    • Update OpernSearchIndex class to override the visitEval( ) method, and return a new OpenSearchEvalOperator instance instead.
    • Update the OpenSearchEvalOperator to perform the following logic when processing geoip function:
      • Reading the incoming ip string
      • Invoke a call on nodeClient with appropriate arguments and timeout value
      • Marshal the response and update the evalMap accordingly

@kedbirhan
Copy link

kedbirhan commented Nov 13, 2024

we don't even have a way to do basic Ip address lookups, why are you guys working on the next level before even having a basic way to query ip field type??

@andy-k-improving
Copy link
Contributor

Hi @kedbirhan, thanks for the feedback and indeed that make sense.

For now we are only proposing the high-level changes required for the functionality but not yet reach to the implementation phase.

I believe by the time we have the design gathered for this ticket, #3145 should already be wrapped to have the IP type support.

Thanks,

@YANG-DB
Copy link
Member Author

YANG-DB commented Nov 15, 2024

@andy-k-improving
I really like this idea - can you please create an RFC for the Geospatial plugin suggesting this change?
We would need their feedback

@andy-k-improving
Copy link
Contributor

andy-k-improving commented Nov 18, 2024

@andy-k-improving I really like this idea - can you please create an RFC for the Geospatial plugin suggesting this change? We would need their feedback

@YANG-DB see below for the RFC on Geo spatial side.
opensearch-project/geospatial#698
I will proceed to work on the implementation on GeoSpatial side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request PPL Piped processing language
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

5 participants