Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add geo_line aggregation #41612

Merged
merged 34 commits into from
Nov 23, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
db64ace
insilico
talevy Mar 2, 2020
0657012
update code to work
talevy Aug 25, 2020
3089cc2
fix more issues
talevy Aug 26, 2020
d03a4b4
fix more tests
talevy Aug 27, 2020
7de7ce3
insilico
talevy Aug 27, 2020
1f5ee27
remove resizing logic
talevy Aug 31, 2020
d4c4550
add more
talevy Sep 1, 2020
4bbc175
add circuit breaker
talevy Sep 14, 2020
8afb5f6
add telemetry to geo_line
talevy Sep 14, 2020
6d70ccd
Refactor to leverage BucketedSort
talevy Oct 14, 2020
285ffd9
Merge remote-tracking branch 'elastic/master' into geo_line
talevy Oct 29, 2020
baaea0e
add size param
talevy Oct 29, 2020
bb8c686
Merge remote-tracking branch 'elastic/master' into geo_line
talevy Oct 30, 2020
708be62
fix up some tests
talevy Oct 30, 2020
6a3ebae
fix final reduction
talevy Oct 30, 2020
5174204
add optional [size] param
talevy Nov 2, 2020
9a49808
Merge remote-tracking branch 'elastic/master' into geo_line
talevy Nov 2, 2020
fe755ef
Merge remote-tracking branch 'elastic/master' into geo_line
talevy Nov 3, 2020
a5c32ce
fix tests
talevy Nov 4, 2020
dab627e
Merge remote-tracking branch 'elastic/master' into geo_line
talevy Nov 5, 2020
1a71f7b
Merge remote-tracking branch 'elastic/master' into geo_line
talevy Nov 10, 2020
c256eb9
move AnyMultiValueSource to GeoLineMultiValueSource
talevy Nov 10, 2020
8d12c67
Merge remote-tracking branch 'elastic/master' into geo_line
talevy Nov 12, 2020
8fa5b91
use priority queue when reducing (wip - broken)
talevy Nov 16, 2020
a423e76
Merge remote-tracking branch 'elastic/master' into geo_line
talevy Nov 17, 2020
37bf8f6
fix internalgeolinetests
talevy Nov 17, 2020
e37a935
Merge remote-tracking branch 'elastic/master' into geo_line
talevy Nov 17, 2020
be487a0
cleanup and add docs
talevy Nov 18, 2020
d430453
update geo_line license to Gold
talevy Nov 18, 2020
8f5dcd7
Merge remote-tracking branch 'elastic/master' into geo_line
talevy Nov 18, 2020
f64ce61
add missing test and add docs
talevy Nov 18, 2020
ebff212
resolve some docs issues
talevy Nov 18, 2020
781e7c4
Merge remote-tracking branch 'elastic/master' into geo_line
talevy Nov 23, 2020
1971ce5
guard for empty internalgeolines
talevy Nov 23, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/reference/aggregations/metrics.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ include::metrics/geobounds-aggregation.asciidoc[]

include::metrics/geocentroid-aggregation.asciidoc[]

include::metrics/geoline-aggregation.asciidoc[]

include::metrics/matrix-stats-aggregation.asciidoc[]

include::metrics/max-aggregation.asciidoc[]
Expand Down
143 changes: 143 additions & 0 deletions docs/reference/aggregations/metrics/geoline-aggregation.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
[role="xpack"]
[testenv="gold"]
[[search-aggregations-metrics-geo-line]]
=== Geo-Line Aggregation
++++
<titleabbrev>Geo-Line</titleabbrev>
++++

The `geo_line` aggregation aggregates all `geo_point` values within a bucket into a LineString ordered
by the chosen `sort` field. This `sort` can be a date field, for example. The bucket returned is a valid
https://tools.ietf.org/html/rfc7946#section-3.2[GeoJSON Feature] representing the line geometry.

[source,console,id=search-aggregations-metrics-geo-line-simple]
----
PUT test
{
"mappings": {
"dynamic": "strict",
"_source": {
"enabled": false
},
"properties": {
"my_location": {
"type": "geo_point"
},
"group": {
"type": "keyword"
},
"@timestamp": {
"type": "date"
}
}
}
}

POST /test/_bulk?refresh
{"index": {}}
{"my_location": {"lat":37.3450570, "lon": -122.0499820}, "@timestamp": "2013-09-06T16:00:36"}
{"index": {}}
{"my_location": {"lat": 37.3451320, "lon": -122.0499820}, "@timestamp": "2013-09-06T16:00:37Z"}
{"index": {}}
{"my_location": {"lat": 37.349283, "lon": -122.0505010}, "@timestamp": "2013-09-06T16:00:37Z"}

POST /test/_search?filter_path=aggregations
{
"aggs": {
"line": {
"geo_line": {
"point": {"field": "my_location"},
"sort": {"field": "@timestamp"}
}
}
}
}
----

Which returns:

[source,js]
----
{
"aggregations": {
"line": {
"type" : "Feature",
"geometry" : {
"type" : "LineString",
"coordinates" : [
[
-122.049982,
37.345057
],
[
-122.050501,
37.349283
],
[
-122.049982,
37.345132
]
]
},
"properties" : {
"complete" : true
}
}
}
}
----
// TESTRESPONSE

[[search-aggregations-metrics-geo-line-options]]
==== Options

`point`::
(Required)

This option specifies the name of the `geo_point` field

Example usage configuring `my_location` as the point field:

[source,js]
----
"point": {
"field": "my_location"
}
----
// NOTCONSOLE

`sort`::
(Required)

This option specifies the name of the numeric field to use as the sort key
for ordering the points

Example usage configuring `@timestamp` as the sort key:

[source,js]
----
"point": {
"field": "@timestamp"
}
----
// NOTCONSOLE

`include_sort`::
(Optional, boolean, default: `false`)

This option includes, when true, an additional array of the sort values in the
feature properties.

`sort_order`::
(Optional, string, default: `"ASC"`)

This option accepts one of two values: "ASC", "DESC".

The line is sorted in ascending order by the sort key when set to "ASC", and in descending
with "DESC".

`size`::
(Optional, integer, default: `10000`)

The maximum length of the line represented in the aggregation. Valid sizes are
between one and 10000.
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@
* worst case. Critically, it is a very fast {@code O(1)} to check if a value
* is competitive at all which, so long as buckets aren't hit in reverse
* order, they mostly won't be. Extracting results in sorted order is still
* {@code O(n * log n)}.
* {@code O(n * log n)}.
* </p>
* <p>
* When we first collect a bucket we make sure that we've allocated enough
Expand All @@ -90,7 +90,7 @@ public interface ExtraData {
* <p>
* Both parameters will have previously been loaded by
* {@link Loader#loadFromDoc(long, int)} so the implementer shouldn't
* need to grow the underlying storage to implement this.
* need to grow the underlying storage to implement this.
* </p>
*/
void swap(long lhs, long rhs);
Expand Down Expand Up @@ -128,7 +128,7 @@ public Loader loader(LeafReaderContext ctx) throws IOException {
private final SortOrder order;
private final DocValueFormat format;
private final int bucketSize;
private final ExtraData extra;
protected final ExtraData extra;
/**
* {@code true} if the bucket is in heap mode, {@code false} if
* it is still gathering.
Expand Down Expand Up @@ -206,9 +206,9 @@ public final List<SortValue> getValues(long bucket) throws IOException {
}

/**
* Is this bucket a min heap {@code true} or in gathering mode {@code false}?
* Is this bucket a min heap {@code true} or in gathering mode {@code false}?
*/
private boolean inHeapMode(long bucket) {
public boolean inHeapMode(long bucket) {
return heapMode.get(bucket);
}

Expand Down Expand Up @@ -254,7 +254,7 @@ private boolean inHeapMode(long bucket) {
/**
* {@code true} if the entry at index {@code lhs} is "better" than
* the entry at {@code rhs}. "Better" in this means "lower" for
* {@link SortOrder#ASC} and "higher" for {@link SortOrder#DESC}.
* {@link SortOrder#ASC} and "higher" for {@link SortOrder#DESC}.
*/
protected abstract boolean betterThan(long lhs, long rhs);

Expand Down Expand Up @@ -283,7 +283,7 @@ protected final String debugFormat() {

/**
* Initialize the gather offsets after setting up values. Subclasses
* should call this once, after setting up their {@link #values()}.
* should call this once, after setting up their {@link #values()}.
*/
protected final void initGatherOffsets() {
setNextGatherOffsets(0);
Expand Down Expand Up @@ -325,12 +325,12 @@ private void setNextGatherOffsets(long startingAt) {
* case.
* </p>
* <ul>
* <li>Hayward, Ryan; McDiarmid, Colin (1991).
* <li>Hayward, Ryan; McDiarmid, Colin (1991).
* <a href="https://web.archive.org/web/20160205023201/http://www.stats.ox.ac.uk/__data/assets/pdf_file/0015/4173/heapbuildjalg.pdf">
* Average Case Analysis of Heap Building byRepeated Insertion</a> J. Algorithms.
* <li>D.E. Knuth, ”The Art of Computer Programming, Vol. 3, Sorting and Searching”</li>
* </ul>
* @param rootIndex the index the start of the bucket
* @param rootIndex the index the start of the bucket
*/
private void heapify(long rootIndex) {
int maxParent = bucketSize / 2 - 1;
Expand All @@ -344,7 +344,7 @@ private void heapify(long rootIndex) {
* runs in {@code O(log n)} time.
* @param rootIndex index of the start of the bucket
* @param parent Index within the bucket of the parent to check.
* For example, 0 is the "root".
* For example, 0 is the "root".
*/
private void downHeap(long rootIndex, int parent) {
while (true) {
Expand Down Expand Up @@ -443,7 +443,7 @@ public final void collect(int doc, long bucket) throws IOException {
/**
* {@code true} if the sort value for the doc is "better" than the
* entry at {@code index}. "Better" in means is "lower" for
* {@link SortOrder#ASC} and "higher" for {@link SortOrder#DESC}.
* {@link SortOrder#ASC} and "higher" for {@link SortOrder#DESC}.
*/
protected abstract boolean docBetterThan(long index);

Expand Down Expand Up @@ -545,7 +545,7 @@ public abstract static class ForFloats extends BucketedSort {
* The maximum size of buckets this can store. This is because we
* store the next offset to write to in a float and floats only have
* {@code 23} bits of mantissa so they can't accurate store values
* higher than {@code 2 ^ 24}.
* higher than {@code 2 ^ 24}.
*/
public static final int MAX_BUCKET_SIZE = (int) Math.pow(2, 24);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
import org.elasticsearch.search.sort.BucketedSort;
import org.elasticsearch.search.sort.SortBuilder;
import org.elasticsearch.search.sort.SortValue;
import org.elasticsearch.xpack.core.common.search.aggregations.MissingHelper;
import org.elasticsearch.xpack.analytics.topmetrics.InternalTopMetrics.MetricValue;

import java.io.IOException;
Expand Down Expand Up @@ -495,62 +496,4 @@ public Loader loader(LeafReaderContext ctx) throws IOException {
public void close() {}
}

/**
* Helps {@link LongMetricValues} track "empty" slots. It attempts to have
* very low CPU overhead and no memory overhead when there *aren't* empty
* values.
*/
private static class MissingHelper implements Releasable {
private final BigArrays bigArrays;
private BitArray tracker;

MissingHelper(BigArrays bigArrays) {
this.bigArrays = bigArrays;
}

void markMissing(long index) {
if (tracker == null) {
tracker = new BitArray(index, bigArrays);
}
tracker.set(index);
}

void markNotMissing(long index) {
if (tracker == null) {
return;
}
tracker.clear(index);
}

void swap(long lhs, long rhs) {
if (tracker == null) {
return;
}
boolean backup = tracker.get(lhs);
if (tracker.get(rhs)) {
tracker.set(lhs);
} else {
tracker.clear(lhs);
}
if (backup) {
tracker.set(rhs);
} else {
tracker.clear(rhs);
}
}

boolean isEmpty(long index) {
if (tracker == null) {
return false;
}
return tracker.get(index);
}

@Override
public void close() {
if (tracker != null) {
tracker.close();
}
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,8 @@ public enum Feature {

SPATIAL_GEO_GRID(OperationMode.GOLD, true),

SPATIAL_GEO_LINE(OperationMode.GOLD, true),

ANALYTICS(OperationMode.MISSING, true),

SEARCHABLE_SNAPSHOTS(OperationMode.ENTERPRISE, true);
Expand Down
Loading