Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geo_point value parsing seems inconsistent and documentation could be improved #24944

Closed
pandrese opened this issue May 29, 2017 · 4 comments
Closed
Assignees
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >bug

Comments

@pandrese
Copy link

Elasticsearch version:
5.4

Plugins installed: []
None

JVM version (java -version):
1.8.0_131

OS version (uname -a if on a Unix-like system):
Linux 13f1d2351e28 4.4.0-78-generic #99-Ubuntu SMP Thu Apr 27 15:29:09 UTC 2017 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
Confusing error messages are returned during indexing of geo_points, and even worse, in the case of using "ignore_malformed" in the type mapping, the geo_point field will have a wrong value which will be used in e.g. aggregations.

The core of the problem seems to be that using the object format for the geo_point, string values for lat/lon are as expected parsed and interpreted as floats. Where as in the geo_point array format a string will be interpreted as a geohash value. The error message returned when parsing a float as a geohash value is confusing (also considering that the decimal point is not a valid character in the geohash encoding). Could this somehow be improved to make this easier to catch and understand?

In the case where when specifying "ignore_malformed" in the type mapping, I would have expected that the field would be ignored in the document, and not that it is set to the best guess. Or did I misread the documentation?

Steps to reproduce:
The problem can be recreated by issuing the following commands:

curl -XPUT localhost:9200/geotest -d '{
  "mappings": {
    "test": {
      "properties": {
        "loc": {
          "type": "geo_point"
        }
      }
    }
  }
}'
curl -XPUT localhost:9200/geotesti -d '{
  "mappings": {
    "test": {
      "properties": {
        "loc": {
          "type": "geo_point",
          "ignore_malformed": "true"
        }
      }
    }
  }
}'

curl -XPUT localhost:9200/geotest/test/array -d '{"loc": [5, 55.5]}'
curl -XPUT localhost:9200/geotest/test/arrayerr -d '{"loc": ["5", "55.5"]}'
curl -XPUT localhost:9200/geotest/test/llerr -d '{
  "loc": {
    "lat": "55.5",
    "lon": "5"
  }
}'
curl -XPUT localhost:9200/geotest/test/llerr -d '{
  "loc": {
    "lat": 55.5,
    "lon": 5
  }
}'

curl -XPUT localhost:9200/geotesti/test/array -d '{"loc": [5, 55.5]}'
curl -XPUT localhost:9200/geotesti/test/arrayerr -d '{"loc": ["5", "55.5"]}'
curl -XPUT localhost:9200/geotesti/test/llerr -d '{
  "loc": {
    "lat": "55.5",
    "lon": "5"
  }
}'
curl -XPUT localhost:9200/geotesti/test/llerr -d '{
  "loc": {
    "lat": 55.5,
    "lon": 5
  }
}'

sleep 30

curl -XPOST localhost:9200/geotest/_search -d '{"query": { "match_all": {}}, "aggs": {"geo": {"geohash_grid": {"field": "loc"}}}}'

curl -XPOST localhost:9200/geotesti/_search -d '{"query": { "match_all": {}}, "aggs": {"geo": {"geohash_grid": {"field": "loc"}}}}'

Provide logs (if relevant):

++ curl -XPUT localhost:9200/geotest -d '{
  "mappings": {
    "test": {
      "properties": {
        "loc": {
          "type": "geo_point"
        }
      }
    }
  }
}'
{"acknowledged":true,"shards_acknowledged":true}

++ curl -XPUT localhost:9200/geotesti -d '{
  "mappings": {
    "test": {
      "properties": {
        "loc": {
          "type": "geo_point",
          "ignore_malformed": "true"
        }
      }
    }
  }
}'
{"acknowledged":true,"shards_acknowledged":true}

++ curl -XPUT localhost:9200/geotest/test/array -d '{"loc": [5, 55.5]}'
{"_index":"geotest","_type":"test","_id":"array","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"created":true}

++ curl -XPUT localhost:9200/geotest/test/arrayerr -d '{"loc": ["5", "55.5"]}'
{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed to parse"}],"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"illegal_argument_exception","reason":"illegal latitude value [269.12109375] for loc"}},"status":400}

++ curl -XPUT localhost:9200/geotest/test/llerr -d '{
  "loc": {
    "lat": "55.5",
    "lon": "5"
  }
}'
{"_index":"geotest","_type":"test","_id":"llerr","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"created":true}

++ curl -XPUT localhost:9200/geotest/test/ll -d '{
  "loc": {
    "lat": 55.5,
    "lon": 5
  }
}'
{"_index":"geotest","_type":"test","_id":"ll","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"created":true}

++ curl -XPUT localhost:9200/geotesti/test/array -d '{"loc": [5, 55.5]}'
{"_index":"geotesti","_type":"test","_id":"array","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"created":true}

++ curl -XPUT localhost:9200/geotesti/test/arrayerr -d '{"loc": ["5", "55.5"]}'
{"_index":"geotesti","_type":"test","_id":"arrayerr","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"created":true}

++ curl -XPUT localhost:9200/geotesti/test/llerr -d '{
  "loc": {
    "lat": "55.5",
    "lon": "5"
  }
}'
{"_index":"geotesti","_type":"test","_id":"llerr","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"created":true}

++ curl -XPUT localhost:9200/geotesti/test/ll -d '{
  "loc": {
    "lat": 55.5,
    "lon": 5
  }
}'
{"_index":"geotesti","_type":"test","_id":"ll","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"created":true}

++ sleep 30

++ curl -XPOST localhost:9200/geotest/_search -d '{"query": { "match_all": {}}, "size": 0, "aggs": {"geo": {"geohash_grid": {"field": "loc"}}}}'
{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":3,"max_score":0.0,"hits":[]},"aggregations":{"geo":{"buckets":[{"key":"u1ge9","doc_count":3}]}}}

++ curl -XPOST localhost:9200/geotesti/_search -d '{"query": { "match_all": {}},  "size": 0 ,"aggs": {"geo": {"geohash_grid": {"field": "loc"}}}}'
{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":4,"max_score":0.0,"hits":[]},"aggregations":{"geo":{"buckets":[{"key":"u1ge9","doc_count":3},{"key":"5bpj0","doc_count":1},{"key":"50000","doc_count":1}]}}}
@clintongormley clintongormley added :Analytics/Geo Indexing, search aggregations of geo points and shapes >bug labels May 30, 2017
@clintongormley
Copy link
Contributor

@pandrese thanks for such a good recreation

@nknize please could you take a look

@imotov imotov assigned imotov and unassigned nknize May 3, 2018
@imotov
Copy link
Contributor

imotov commented May 3, 2018

So, this is a tricky one. The problem here is that expression likes "loc": ["44", "55"] to a human eye might look like a location with lon: 44 and lat: 55. However, elasticsearch interprets it as 2 locations: one with geohash 44 and another with geohash 55. So, instead of a single location somewhere in Central Russia, we get two locations near Antarctica. When coordinates have dots and we ignore malformed locations, only one geopoint makes it in causing unexpected and difficult to interpret results. I am planning to add some validation to geohash parsing as part of fix for #23579, which will at least fix the error message, so instead of illegal latitude value [269.12109375] for loc we would get a more easily interpretable illegal symbol [.] in geohash [55.5], but besides this, I am not really sure how to deal with this.

@pandrese
Copy link
Author

pandrese commented May 7, 2018

@imotov Your proposal for a better error message would help a lot to pinpoint the problem.

It has been a while, but as far as I remember when you set "ignore_malformed" on the type mapping, my expected behavior would be that malformed value should be ignored, and not saved in the document.

@imotov
Copy link
Contributor

imotov commented May 8, 2018

It has been a while, but as far as I remember when you set "ignore_malformed" on the type mapping, my expected behavior would be that malformed value should be ignored, and not saved in the document.

That is the behaviour. As I tried to explain above ["5", "55.5"] are parsed as 2 geohashes. The first geohash 5 is correct and stored as a geopoint 70°00'00.0"S 39°00'00.0"W (geohash 50000). The second value 55.5 is an incorrect geohash - it used to produce a confusing error illegal latitude value [269.12109375] for loc, after I merged #30376 it produces a more meaningful error unsupported symbol [.] in geohash [55.5]. If ignore_malformed is set to true the first "correct" value "5" is indexed and the second incorrect value "55.5" is ignored. If ignore_malformed is set to false the entire document is ignored, so both points are not indexed.

I cannot think of any additional work that can be done in order to improve the situation here. So, if there are no other suggestions, I am going to close this issue for now.

@imotov imotov closed this as completed May 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >bug
Projects
None yet
Development

No branches or pull requests

4 participants