-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PPL geoip function #871
PPL geoip function #871
Conversation
Hi @YANG-DB, I heard from Anas that you had a method of implementing ip2geo functionality for Spark. I wanted to check with you that our current approach aligns with your method. Current Plan:
This PR has a stub udf implementation for better idea of how this would be implemented. All of this would have to be implemented within the Spark library, as currently I am not aware of how to access any geospatial artifacts. If you know of a better way to implement ip2geo please let me know! Thanks! |
ppl-spark-integration/src/main/java/org/opensearch/sql/expression/function/SerializableUdf.java
Outdated
Show resolved
Hide resolved
@kenrickyap please update the DCO (contributor sign-off) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kenrickyap I'm missing a more detailed design of the actual functionality that should include:
- flow diagram of loading the geoIp data
- storage of that data
- utilizing of different
datasource
- pro/cons for the different approaches
please add the former discussions made into this issue for better traceability
ppl-spark-integration/src/main/java/org/opensearch/sql/expression/function/SerializableUdf.java
Outdated
Show resolved
Hide resolved
import org.opensearch.sql.ast.expression.When; | ||
import org.opensearch.sql.ast.expression.WindowFunction; | ||
import org.opensearch.sql.ast.expression.Xor; | ||
import org.opensearch.sql.ast.expression.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please keep the existing explicit import list ...
Flow diagram for ip2geo UDF Design Details
Pros
Cons
|
@kenrickyap a few questions here:
|
Yes we will create Ip2GeoCache to be a singleton. From my understanding this should allow the cache to be accessible between sessions
Will update the flow diagram to reflect facade for different datasource. However, would it be possible to provide a service (API) example usage? I am not to sure what such a datasource would be expected to return. Also would there be a fixed schema for an index in opensource?
In hindsight I will just use a hashmap to store cidr geo_data where the cidr bitstring will be the key and geo_data will be a value. To preform the ip cidr matching will implement lookup function to convert ip to bitstring and reduce bit length till key is found in map. Initially I wanted to leverage the
As mentioned above will use a hash-map instead of trie.
Will have pseudo-code and code added by EOD |
docs/ppl-lang/planning/ppl-geoip.md
Outdated
|
||
geoip function to add information about the geographical location of an IPv4 or IPv6 address | ||
|
||
1. **Proposed syntax** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps also list out the values that can be used for properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need a larger set of examples here plz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added additional examples in ppl_ip.md doc
@@ -894,6 +895,10 @@ coalesceFunctionName | |||
: COALESCE | |||
; | |||
|
|||
geoIpFunctionCall | |||
: GEOIP LT_PRTHS (datasource = functionArg COMMA)? ipAddress = functionArg (COMMA properties = stringLiteral)? RT_PRTHS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could do further parsing here of properties
. Something like:
... (COMMA properties = geoipPropertiesList)? ...
geoipPropertiesList
: geoipProperty (COMMA geoipProperty)*
geoipProperty
: city
| lat
| lon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea !!
import org.opensearch.sql.ast.expression.Field; | ||
import org.opensearch.sql.ast.expression.FieldList; | ||
import org.opensearch.sql.ast.expression.LambdaFunction; | ||
import org.opensearch.sql.ast.expression.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid using *
imports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. please config your IDE to disable the auto-merging.
...on/src/test/scala/org/opensearch/flint/spark/ppl/PPLLogicalPlanEvalTranslatorTestSuite.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kenrickyap can u plz update the PR status and progress ?
thanks
ppl-spark-integration/src/main/java/org/opensearch/sql/common/geospatial/CidrGeoMap.java
Outdated
Show resolved
Hide resolved
@YANG-DB we are nearly done implementation, am currently testing implementation of geoip functionality on TestDatasourceDao that provides a mock stream of geo data. @jduo is implementing the manifest dao, in parallel Also would it be ok to move API and OpenSearch index DAO implementation to another ticket? We are not sure what the specifications for this would be as there are no examples, and am not sure this would be done by Nov 15th. |
@kenrickyap plz fix the DCO error ... |
docs/ppl-lang/planning/ppl-geoip.md
Outdated
|
||
geoip function to add information about the geographical location of an IPv4 or IPv6 address | ||
|
||
1. **Proposed syntax** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need a larger set of examples here plz
ppl-spark-integration/src/main/java/org/opensearch/sql/common/geospatial/CidrGeoMap.java
Outdated
Show resolved
Hide resolved
ppl-spark-integration/src/main/java/org/opensearch/sql/common/geospatial/DatasourceDao.java
Outdated
Show resolved
Hide resolved
ppl-spark-integration/src/main/java/org/opensearch/sql/common/geospatial/TestDatasourceDao.java
Outdated
Show resolved
Hide resolved
@LantaoJin would you be able to provide a review? |
Signed-off-by: Kenrick Yap <kenrick.yap@improving.com>
Signed-off-by: kenrickyap <121634635+kenrickyap@users.noreply.github.com>
integ-test/src/integration/scala/org/opensearch/flint/spark/ppl/FlintSparkPPLGeoipITSuite.scala
Show resolved
Hide resolved
integ-test/src/integration/scala/org/opensearch/flint/spark/ppl/FlintSparkPPLGeoipITSuite.scala
Show resolved
Hide resolved
@kenrickyap I only conducted the review for half of the changes, will continue tomorrow, also checkstyle is failing on CI, would you mind to fix? |
@kenrickyap can u see why the build is failing ? |
fixed the styling issues |
...st/scala/org/opensearch/flint/spark/ppl/PPLLogicalPlanGeoipFunctionTranslatorTestSuite.scala
Outdated
Show resolved
Hide resolved
...st/scala/org/opensearch/flint/spark/ppl/PPLLogicalPlanGeoipFunctionTranslatorTestSuite.scala
Show resolved
Hide resolved
...tegration/src/main/java/org/opensearch/sql/ppl/utils/GeoIpCatalystLogicalPlanTranslator.java
Show resolved
Hide resolved
...tegration/src/main/java/org/opensearch/sql/ppl/utils/GeoIpCatalystLogicalPlanTranslator.java
Outdated
Show resolved
Hide resolved
...tegration/src/main/java/org/opensearch/sql/ppl/utils/GeoIpCatalystLogicalPlanTranslator.java
Outdated
Show resolved
Hide resolved
...tegration/src/main/java/org/opensearch/sql/ppl/utils/GeoIpCatalystLogicalPlanTranslator.java
Outdated
Show resolved
Hide resolved
...tegration/src/main/java/org/opensearch/sql/ppl/utils/GeoIpCatalystLogicalPlanTranslator.java
Outdated
Show resolved
Hide resolved
...tegration/src/main/java/org/opensearch/sql/ppl/utils/GeoIpCatalystLogicalPlanTranslator.java
Outdated
Show resolved
Hide resolved
...tegration/src/main/java/org/opensearch/sql/ppl/utils/GeoIpCatalystLogicalPlanTranslator.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Kenrick Yap <14yapkc1@gmail.com>
Signed-off-by: Kenrick Yap <kenrick.yap@improving.com>
Signed-off-by: Kenrick Yap <kenrick.yap@improving.com>
ppl-spark-integration/src/main/java/org/opensearch/sql/ppl/CatalystQueryPlanVisitor.java
Show resolved
Hide resolved
ppl-spark-integration/src/main/java/org/opensearch/sql/ppl/CatalystQueryPlanVisitor.java
Show resolved
Hide resolved
ppl-spark-integration/src/main/java/org/opensearch/sql/ppl/parser/AstExpressionBuilder.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kenrickyap I've added a few comments
plz review - thanks
Signed-off-by: Kenrick Yap <kenrick.yap@improving.com>
@YANG-DB have addressed new PR comments, not sure if I fully understand some of the comments, have added my understanding of unaddressed comments |
Signed-off-by: Kenrick Yap <kenrick.yap@improving.com>
Signed-off-by: Kenrick Yap <kenrick.yap@improving.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK LGTM !
Thx for taking the time to address comments. :) |
Description
PPL geoip function
Issues Resolved
#672
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
This PR is a continuation of #781 due to lacking permissions to push to forked branch in said PR