-
Notifications
You must be signed in to change notification settings - Fork 33
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* geoip function implementation Signed-off-by: Kenrick Yap <14yapkc1@gmail.com> * Fixed integration tests Signed-off-by: Kenrick Yap <kenrick.yap@improving.com> * linting Signed-off-by: Kenrick Yap <kenrick.yap@improving.com> * addressing PR comments (added addtional integ tests, doc changes) Signed-off-by: Kenrick Yap <14yapkc1@gmail.com> * fixed new integ tests Signed-off-by: Kenrick Yap <kenrick.yap@improving.com> * addressing pr comments Signed-off-by: Kenrick Yap <kenrick.yap@improving.com> * address review comments Signed-off-by: Kenrick Yap <kenrick.yap@improving.com> * moved validateGeoIpProperty to relevant class Signed-off-by: Kenrick Yap <kenrick.yap@improving.com> * updated scalaudf function descriptions Signed-off-by: Kenrick Yap <kenrick.yap@improving.com> --------- Signed-off-by: Kenrick Yap <14yapkc1@gmail.com> Signed-off-by: Kenrick Yap <kenrick.yap@improving.com> Signed-off-by: kenrickyap <121634635+kenrickyap@users.noreply.github.com> Co-authored-by: Kenrick Yap <14yapkc1@gmail.com>
- Loading branch information
1 parent
957de4e
commit 20ef890
Showing
15 changed files
with
1,314 additions
and
34 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
## geoip syntax proposal | ||
|
||
geoip function to add information about the geographical location of an IPv4 or IPv6 address | ||
|
||
**Implementation syntax** | ||
- `... | eval geoinfo = geoip(ipAddress *[,properties])` | ||
- generic syntax | ||
- `... | eval geoinfo = geoip(ipAddress)` | ||
- retrieves all geo data | ||
- `... | eval geoinfo = geoip(ipAddress, city, location)` | ||
- retrieve only city, and location | ||
|
||
**Implementation details** | ||
- Current implementation requires user to have created a geoip table. Geoip table has the following schema: | ||
|
||
```SQL | ||
CREATE TABLE geoip ( | ||
cidr STRING, | ||
country_iso_code STRING, | ||
country_name STRING, | ||
continent_name STRING, | ||
region_iso_code STRING, | ||
region_name STRING, | ||
city_name STRING, | ||
time_zone STRING, | ||
location STRING, | ||
ip_range_start BIGINT, | ||
ip_range_end BIGINT, | ||
ipv4 BOOLEAN | ||
) | ||
``` | ||
|
||
- `geoip` is resolved by performing a join on said table and projecting the resulting geoip data as a struct. | ||
- an example of using `geoip` is equivalent to running the following SQL query: | ||
|
||
```SQL | ||
SELECT source.*, struct(geoip.country_name, geoip.city_name) AS a | ||
FROM source, geoip | ||
WHERE geoip.ip_range_start <= ip_to_int(source.ip) | ||
AND geoip.ip_range_end > ip_to_int(source.ip) | ||
AND geoip.ip_type = is_ipv4(source.ip); | ||
``` | ||
- in the case that only one property is provided in function call, `geoip` returns string of specified property instead: | ||
|
||
```SQL | ||
SELECT source.*, geoip.country_name AS a | ||
FROM source, geoip | ||
WHERE geoip.ip_range_start <= ip_to_int(source.ip) | ||
AND geoip.ip_range_end > ip_to_int(source.ip) | ||
AND geoip.ip_type = is_ipv4(source.ip); | ||
``` | ||
|
||
**Future plan for additional data-sources** | ||
|
||
- Currently only using pre-existing geoip table defined within spark is possible. | ||
- There is future plans to allow users to specify data sources: | ||
- API data sources - if users have their own geoip provided will create ability for users to configure and call said endpoints | ||
- OpenSearch geospatial client - once geospatial client is published we can leverage client to utilize OpenSearch geo2ip functionality. | ||
- Additional datasource connection params will be provided through spark config options. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.