Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Normalize Pipeline Aggregation #56399

Merged
merged 15 commits into from
May 14, 2020
Merged

Add Normalize Pipeline Aggregation #56399

merged 15 commits into from
May 14, 2020

Conversation

talevy
Copy link
Contributor

@talevy talevy commented May 8, 2020

This aggregation will perform normalizations of metrics
for a given series of data in the form of bucket values.

The aggregations supports the following normalizations

  • rescale 0-1
  • rescale 0-100
  • percentage of sum
  • mean normalization
  • z-score normalization
  • softmax normalization

To specify which normalization is to be used, it can be specified
in the normalize agg's normalizer field.

For example:

{
  "normalize": {
    "buckets_path": <>,
    "normalizer": "percent"
  }
}

Closes #51005.

@talevy talevy added WIP :Analytics/Aggregations Aggregations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels May 8, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

This aggregation will perform normalizations of metrics
for a given series of data in the form of bucket values.

The aggregations supports the following normalizations

- rescale 0-1
- rescale 0-100
- percentage of sum
- mean normalization
- z-score normalization
- softmax normalization

To specify which normalization is to be used, it can be specified
in the normalize agg's `normalizer` field.

For example:

```
{
  "normalize": {
    "buckets_path": <>,
    "normalizer": "percent"
  }
}
```

Closes elastic#51005.
@nik9000 nik9000 self-requested a review May 8, 2020 12:09
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments because I'm excited about this!

static final ParseField NORMALIZER_FIELD = new ParseField("normalizer");

@SuppressWarnings("unchecked")
public static final ConstructingObjectParser<NormalizePipelineAggregationBuilder, String> PARSER = new ConstructingObjectParser<>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check out InstantiatingObjectParser!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, I tried changing that parser to work here, but I think it deserves its own change. The InstantiatingObjectParser does not expose the Context in such a way that more constructor arguments can be passed in. I believe this can change, but I'd rather not do that here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

normalizedBucketValue = normalizer.normalize(thisBucketValue);
}

List<InternalAggregation> aggs = StreamSupport.stream(bucket.getAggregations().spliterator(), false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bucket.getAggregations().copyResults() does this without so much boiler plate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately, that method does not work in this context. I think a more dedicated cleanup for this boilerplate can be tackled outside of this PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@talevy talevy requested a review from nik9000 May 12, 2020 05:30
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Merge whenever you are happy with the docs!

Copy link
Contributor

@polyfractal polyfractal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments. I think the only notable one is about handling terms agg as the parent bucket :)

Looks good!

--------------------------------------------------
// NOTCONSOLE

[[normalizer_pipeline-params]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make a note somewhere that this pipeline always uses a skip gap policy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call!


public class NormalizePipelineAggregationBuilder extends AbstractPipelineAggregationBuilder<NormalizePipelineAggregationBuilder> {
public static final String NAME = "normalize";
static final ParseField NORMALIZER_FIELD = new ParseField("normalizer");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with normalizer, but wanted to also suggest method as a potential param name. No strong opinion though :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wishy washy on the naming here as well, and decided not to fret, but I too have leaned towards method earlier, so I am happy to do so here. especially given the overloading of the term across the stack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the naming to be method

if (bucketsPaths.length != 1) {
context.addBucketPathValidationError("must contain a single entry for aggregation [" + name + "]");
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also check context.validateHasParent() to make sure this isn't at the top level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yes. I wasn't aware of this. thanks for bringing it up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a check and a test for this!

histo = (InternalMultiBucketAggregation<? extends InternalMultiBucketAggregation, ? extends
InternalMultiBucketAggregation.InternalBucket>) aggregation;
List<? extends InternalMultiBucketAggregation.InternalBucket> buckets = histo.getBuckets();
HistogramFactory factory = (HistogramFactory) histo;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know if this works with a terms agg as the parent? It feels like it should (e.g. it doesn't require any specific ordering of the buckets, unlike something like a moving avg which needs an ordering).

If we think it should work with terms we should tweak this to not use a HistogramFactory directly. BucketScriptPipelineAggregator has an example of how to generically build buckets from any InternalMultiBucketAggregation (the internal agg can create buckets too, not just the factory).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! I was slightly loose in my interpretation of the HistogramFactory's comment

/** Implemented by histogram aggregations and used by pipeline aggregations to insert buckets. */

Will look at how BucketScript does things and add a test for terms agg!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes! I'm sorry I didn't notice this one!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I've updated to include a test for terms and use a more generic way to make new buckets

Copy link
Contributor

@polyfractal polyfractal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@talevy talevy merged commit 79367e4 into elastic:master May 14, 2020
@talevy talevy deleted the normalize branch May 14, 2020 20:32
talevy added a commit to talevy/elasticsearch that referenced this pull request May 14, 2020
This aggregation will perform normalizations of metrics
for a given series of data in the form of bucket values.

The aggregations supports the following normalizations

- rescale 0-1
- rescale 0-100
- percentage of sum
- mean normalization
- z-score normalization
- softmax normalization

To specify which normalization is to be used, it can be specified
in the normalize agg's `normalizer` field.

For example:

```
{
  "normalize": {
    "buckets_path": <>,
    "normalizer": "percent"
  }
}
```

Closes elastic#51005.
talevy added a commit that referenced this pull request May 15, 2020
This aggregation will perform normalizations of metrics
for a given series of data in the form of bucket values.

The aggregations supports the following normalizations

- rescale 0-1
- rescale 0-100
- percentage of sum
- mean normalization
- z-score normalization
- softmax normalization

To specify which normalization is to be used, it can be specified
in the normalize agg's `normalizer` field.

For example:

```
{
  "normalize": {
    "buckets_path": <>,
    "normalizer": "percent"
  }
}
```
russcam added a commit to elastic/elasticsearch-net that referenced this pull request Jul 28, 2020
Relates: elastic/elasticsearch#56399

This commit adds the normalize aggregation to
the high level client.
russcam added a commit to elastic/elasticsearch-net that referenced this pull request Jul 31, 2020
Relates: elastic/elasticsearch#56399

This commit adds the normalize aggregation to
the high level client.
github-actions bot pushed a commit to elastic/elasticsearch-net that referenced this pull request Jul 31, 2020
Relates: elastic/elasticsearch#56399

This commit adds the normalize aggregation to
the high level client.
github-actions bot pushed a commit to elastic/elasticsearch-net that referenced this pull request Jul 31, 2020
Relates: elastic/elasticsearch#56399

This commit adds the normalize aggregation to
the high level client.
russcam added a commit to elastic/elasticsearch-net that referenced this pull request Jul 31, 2020
Relates: elastic/elasticsearch#56399

This commit adds the normalize aggregation to
the high level client.

Co-authored-by: Russ Cam <russ.cam@elastic.co>
russcam added a commit to elastic/elasticsearch-net that referenced this pull request Jul 31, 2020
Relates: elastic/elasticsearch#56399

This commit adds the normalize aggregation to
the high level client.

Co-authored-by: Russ Cam <russ.cam@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.9.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Normalization pipeline aggregations
6 participants