Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for geohex_grid over geo_shape #92999

Merged
merged 4 commits into from
Jan 24, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 33 additions & 10 deletions docs/reference/aggregations/bucket/geohexgrid-aggregation.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
<titleabbrev>Geohex grid</titleabbrev>
++++

A multi-bucket aggregation that groups <<geo-point,`geo_point`>>
values into buckets that represent a grid.
A multi-bucket aggregation that groups <<geo-point,`geo_point`>> and
<<geo-shape,`geo_shape`>> values into buckets that represent a grid.
The resulting grid can be sparse and only
contains cells that have matching data. Each cell corresponds to a
https://h3geo.org/docs/core-library/h3Indexing#h3-cell-indexp[H3 cell index] and is
Expand All @@ -18,7 +18,7 @@ Precision for this aggregation can be between 0 and 15, inclusive.

WARNING: High-precision requests can be very expensive in terms of RAM and
result sizes. For example, the highest-precision geohex with a precision of 15
produces cells that cover less than 10cm by 10cm. We recommend you use a
produces cells that cover less than one square metre. We recommend you use a
craigtaverner marked this conversation as resolved.
Show resolved Hide resolved
filter to limit high-precision requests to a smaller geographic area. For an example,
refer to <<geohexgrid-high-precision>>.

Expand Down Expand Up @@ -220,21 +220,43 @@ Response:
--------------------------------------------------
// TESTRESPONSE[s/\.\.\./"took": $body.took,"_shards": $body._shards,"hits":$body.hits,"timed_out":false,/]

[[geohexgrid-options]]
[discrete]
[role="xpack"]
[[geohexgrid-aggregating-geo-shape]]
==== Aggregating `geo_shape` fields

Aggregating on <<geo-shape>> fields works almost as it does for points. There are two key differences:

* When aggregating over `geo_point` data, points are considered within a hexagonal tile if they lie
within the edges defined by great circles. In other words the calculation is done using spherical coordinates.
However, when aggregating over `geo_shape` data, the shapes are considered within a hexagon if they lie
within the edges defined as straight lines on an equirectangular projection. The reason for this is that
visualizing aggregation results in a map application will show surprising results when zoomed out.
Copy link
Contributor

@iverase iverase Jan 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason is that Elasticsearch (more in particular, lucene) treats edges using the equirectangular projection at search time, therefore the mismatch between the query result and the aggregation might provided surprising results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case we should do this with points too, right? I know we cannot change points due to backwards compatibility, but it might be nice to have an explanation for doing the two differently.

Of course with points we have less risk due to only the cells having edges, while with shapes we have edges for both the shapes and the H3 cells, increasing the likelihood of something looking weird. But that does not seem like sufficient reason to use spherical for points.

Copy link
Contributor

@iverase iverase Jan 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue is that we accept edges (polygons and lines) that cannot be represented in spherical coordinates (edges > 180 degrees). This alone makes impossible to resolve geo_shape aggregations using spherical geometry.

Note that we use equirectangular projection but maps are normally using mercator projection, so there is already a mismatch there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Understood. I used your initial explanation, and decided not to involve the visual artefact discussion at all. Also, the question of edges > 180 degrees could, presumably, be solved with sidedness (and orientation), but I understand from previous discussions we have a backwards compatibility issue there. So I simplified and did not bring that up here either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! that's makes more sense to me.

For most data, the difference is subtle or not noticed.
However, for low zoom levels (low precision), especially far from the equator, this can be noticeable.
For example, if the same point data is indexed as `geo_point` and `geo_shape`, it is possible to get
different results when aggregating at lower resolutions.
* As is the case with <<geotilegrid-aggregating-geo-shape,`geotile_grid`>>,
a single shape can be counted for in multiple tiles. A shape will contribute to the count of matching values
if any part of its shape intersects with that tile. Below is an image that demonstrates this:


image:images/spatial/geoshape_hexgrid.png[]

==== Options

[horizontal]
field::
(Required, string) Field containing indexed geo-point values. Must be explicitly
mapped as a <<geo-point,`geo_point`>> field. If the field contains an array,
`geohex_grid` aggregates all array values.
(Required, string) Field containing indexed geo-point or geo-shape values.
Must be explicitly mapped as a <<geo-point,`geo_point`>> or a <<geo-shape,`geo_shape`>> field.
If the field contains an array, `geohex_grid` aggregates all array values.

precision::
(Optional, integer) Integer zoom of the key used to define cells/buckets in
the results. Defaults to `6`. Values outside of [`0`,`15`] will be rejected.

bounds::
(Optional, object) Bounding box used to filter the geo-points in each bucket.
(Optional, object) Bounding box used to filter the geo-points or geo-shapes in each bucket.
Accepts the same bounding box formats as the
<<query-dsl-geo-bounding-box-query-accepted-formats,geo-bounding box query>>.

Expand All @@ -245,5 +267,6 @@ documents they contain.

shard_size::
(Optional, integer) Number of buckets returned from each shard. Defaults to
`max(10,(size x number-of-shards))` to allow for more a accurate count of the
top cells in the final result.
`max(10,(size x number-of-shards))` to allow for a more accurate count of the
top cells in the final result. Since each shard could have a different top result order,
using a larger number here reduces the risk of inaccurate counts, but incurs a performance cost.
58 changes: 36 additions & 22 deletions docs/reference/aggregations/bucket/geotilegrid-aggregation.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ cover only a small area.
* Low precision keys have a smaller range for x and y, and represent tiles that
each cover a large area.

See https://wiki.openstreetmap.org/wiki/Zoom_levels[Zoom level documentation]
See https://wiki.openstreetmap.org/wiki/Zoom_levels[zoom level documentation]
on how precision (zoom) correlates to size on the ground. Precision for this
aggregation can be between 0 and 29, inclusive.

Expand Down Expand Up @@ -102,14 +102,15 @@ Response:
--------------------------------------------------
// TESTRESPONSE[s/\.\.\./"took": $body.took,"_shards": $body._shards,"hits":$body.hits,"timed_out":false,/]

[[geotilegrid-high-precision]]
==== High-precision requests

When requesting detailed buckets (typically for displaying a "zoomed in" map)
When requesting detailed buckets (typically for displaying a "zoomed in" map),
a filter like <<query-dsl-geo-bounding-box-query,geo_bounding_box>> should be
applied to narrow the subject area otherwise potentially millions of buckets
applied to narrow the subject area. Otherwise, potentially millions of buckets
will be created and returned.

[source,console]
[source,console,id=geotilegrid-high-precision-ex]
--------------------------------------------------
POST /museums/_search?size=0
{
Expand Down Expand Up @@ -137,6 +138,8 @@ POST /museums/_search?size=0
--------------------------------------------------
// TEST[continued]

Response:

[source,console-result]
--------------------------------------------------
{
Expand Down Expand Up @@ -166,13 +169,14 @@ POST /museums/_search?size=0
--------------------------------------------------
// TESTRESPONSE[s/\.\.\./"took": $body.took,"_shards": $body._shards,"hits":$body.hits,"timed_out":false,/]

[[geotilegrid-addtl-bounding-box-filtering]]
==== Requests with additional bounding box filtering

The `geotile_grid` aggregation supports an optional `bounds` parameter
that restricts the cells considered to those that intersects the
bounds provided. The `bounds` parameter accepts the bounding box in
all the same <<query-dsl-geo-bounding-box-query-accepted-formats,accepted formats>> of the
bounds specified in the Geo Bounding Box Query. This bounding box can be used with or
that restricts the cells considered to those that intersect the
provided bounds. The `bounds` parameter accepts the same
<<query-dsl-geo-bounding-box-query-accepted-formats,bounding box formats>>
as the geo-bounding box query. This bounding box can be used with or
without an additional `geo_bounding_box` query for filtering the points prior to aggregating.
It is an independent bounding box that can intersect with, be equal to, or be disjoint
to any additional `geo_bounding_box` queries defined in the context of the aggregation.
Expand All @@ -197,6 +201,8 @@ POST /museums/_search?size=0
--------------------------------------------------
// TEST[continued]

Response:

[source,console-result]
--------------------------------------------------
{
Expand Down Expand Up @@ -225,9 +231,10 @@ POST /museums/_search?size=0

[discrete]
[role="xpack"]
[[geotilegrid-aggregating-geo-shape]]
==== Aggregating `geo_shape` fields

Aggregating on <<geo-shape>> fields works just as it does for points, except that a single
Aggregating on <<geo-shape>> fields works almost as it does for points, except that a single
shape can be counted for in multiple tiles. A shape will contribute to the count of matching values
if any part of its shape intersects with that tile. Below is an image that demonstrates this:

Expand All @@ -237,20 +244,27 @@ image:images/spatial/geoshape_grid.png[]
==== Options

[horizontal]
field:: Mandatory. The name of the field indexed with GeoPoints.
field::
(Required, string) Field containing indexed geo-point or geo-shape values.
Must be explicitly mapped as a <<geo-point,`geo_point`>> or a <<geo-shape,`geo_shape`>> field.
If the field contains an array, `geotile_grid` aggregates all array values.

precision:: Optional. The integer zoom of the key used to define
cells/buckets in the results. Defaults to 7.
Values outside of [0,29] will be rejected.
precision::
(Optional, integer) Integer zoom of the key used to define cells/buckets in
the results. Defaults to `7`. Values outside of [`0`,`29`] will be rejected.

bounds: Optional. The bounding box to filter the points in the bucket.
bounds::
(Optional, object) Bounding box used to filter the geo-points or geo-shapes in each bucket.
Accepts the same bounding box formats as the
<<query-dsl-geo-bounding-box-query-accepted-formats,geo-bounding box query>>.

size:: Optional. The maximum number of geohash buckets to return
(defaults to 10,000). When results are trimmed, buckets are
prioritised based on the volumes of documents they contain.
size::
(Optional, integer) Maximum number of buckets to return. Defaults to 10,000.
When results are trimmed, buckets are prioritized based on the volume of
documents they contain.

shard_size:: Optional. To allow for more accurate counting of the top cells
returned in the final result the aggregation defaults to
returning `max(10,(size x number-of-shards))` buckets from each
shard. If this heuristic is undesirable, the number considered
from each shard can be over-ridden using this parameter.
shard_size::
(Optional, integer) Number of buckets returned from each shard. Defaults to
`max(10,(size x number-of-shards))` to allow for a more accurate count of the
top cells in the final result. Since each shard could have a different top result order,
using a larger number here reduces the risk of inaccurate counts, but incurs a performance cost.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.