Skip to content

Commit

Permalink
Added standard deviation / variance sampling to extended stats (elast…
Browse files Browse the repository at this point in the history
…ic#49782)

Per 49554 I added standard deviation sampling and variance sampling to the extended stats interface.

Closes elastic#49554

Co-authored-by: Igor Motov <igor@motovs.org>
  • Loading branch information
andrewjohnson2 and imotov committed Jun 10, 2020
1 parent 9eb8085 commit 42e2308
Show file tree
Hide file tree
Showing 9 changed files with 637 additions and 59 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ GET /exams/_search

The above aggregation computes the grades statistics over all documents. The aggregation type is `extended_stats` and the `field` setting defines the numeric field of the documents the stats will be computed on. The above will return the following:

The `std_deviation` and `variance` are calculated as population metrics so they are always the same as `std_deviation_population` and `variance_population` respectively.

[source,console-result]
--------------------------------------------------
Expand All @@ -36,10 +37,18 @@ The above aggregation computes the grades statistics over all documents. The agg
"sum": 150.0,
"sum_of_squares": 12500.0,
"variance": 625.0,
"variance_population": 625.0,
"variance_sampling": 1250.0,
"std_deviation": 25.0,
"std_deviation_population": 25.0,
"std_deviation_sampling": 35.35533905932738,
"std_deviation_bounds": {
"upper": 125.0,
"lower": 25.0
"lower": 25.0,
"upper_population" : 125.0,
"lower_population" : 25.0,
"upper_sampling" : 145.71067811865476,
"lower_sampling" : 4.289321881345245
}
}
}
Expand Down Expand Up @@ -75,6 +84,9 @@ GET /exams/_search
`sigma` can be any non-negative double, meaning you can request non-integer values such as `1.5`. A value of `0` is valid, but will simply
return the average for both `upper` and `lower` bounds.

The `upper` and `lower` bounds are calculated as population metrics so they are always the same as `upper_population` and
`lower_population` respectively.

.Standard Deviation and Bounds require normality
[NOTE]
=====
Expand All @@ -93,9 +105,9 @@ GET /exams/_search
{
"size": 0,
"aggs" : {
"grades_stats" : {
"extended_stats" : {
"script" : {
"grades_stats" : {
"extended_stats" : {
"script" : {
"source" : "doc['grade'].value",
"lang" : "painless"
}
Expand All @@ -114,8 +126,8 @@ GET /exams/_search
{
"size": 0,
"aggs" : {
"grades_stats" : {
"extended_stats" : {
"grades_stats" : {
"extended_stats" : {
"script" : {
"id": "my_script",
"params": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -113,10 +113,18 @@ And the following may be the response:
"sum": 985.0,
"sum_of_squares": 446725.0,
"variance": 41105.55555555556,
"variance_population": 41105.55555555556,
"variance_sampling": 61658.33333333334,
"std_deviation": 202.74505063146563,
"std_deviation_population": 202.74505063146563,
"std_deviation_sampling": 248.3109609609156,
"std_deviation_bounds": {
"upper": 733.8234345962646,
"lower": -77.15676792959795
"lower": -77.15676792959795,
"upper_population" : 733.8234345962646,
"lower_population" : -77.15676792959795,
"upper_sampling" : 824.9552552551645,
"lower_sampling" : -168.28858858849787
}
}
}
Expand Down

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
Expand All @@ -29,25 +29,55 @@ public interface ExtendedStats extends Stats {
double getSumOfSquares();

/**
* The variance of the collected values.
* The population variance of the collected values.
*/
double getVariance();

/**
* The standard deviation of the collected values.
* The population variance of the collected values.
*/
double getVariancePopulation();

/**
* The sampling variance of the collected values.
*/
double getVarianceSampling();

/**
* The population standard deviation of the collected values.
*/
double getStdDeviation();

/**
* The population standard deviation of the collected values.
*/
double getStdDeviationPopulation();

/**
* The sampling standard deviation of the collected values.
*/
double getStdDeviationSampling();

/**
* The upper or lower bounds of the stdDeviation
*/
double getStdDeviationBound(Bounds bound);

/**
* The standard deviation of the collected values as a String.
* The population standard deviation of the collected values as a String.
*/
String getStdDeviationAsString();

/**
* The population standard deviation of the collected values as a String.
*/
String getStdDeviationPopulationAsString();

/**
* The sampling standard deviation of the collected values as a String.
*/
String getStdDeviationSamplingAsString();

/**
* The upper or lower bounds of stdDev of the collected values as a String.
*/
Expand All @@ -60,13 +90,22 @@ public interface ExtendedStats extends Stats {
String getSumOfSquaresAsString();

/**
* The variance of the collected values as a String.
* The population variance of the collected values as a String.
*/
String getVarianceAsString();

/**
* The population variance of the collected values as a String.
*/
String getVariancePopulationAsString();

/**
* The sampling variance of the collected values as a String.
*/
String getVarianceSamplingAsString();

enum Bounds {
UPPER, LOWER
UPPER, LOWER, UPPER_POPULATION, LOWER_POPULATION, UPPER_SAMPLING, LOWER_SAMPLING
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,11 @@ public double metric(String name, long owningBucketOrd) {
case avg: return Double.NaN;
case sum_of_squares: return 0;
case variance: return Double.NaN;
case variance_population: return Double.NaN;
case variance_sampling: return Double.NaN;
case std_deviation: return Double.NaN;
case std_deviation_population: return Double.NaN;
case std_deviation_sampling: return Double.NaN;
case std_upper: return Double.NaN;
case std_lower: return Double.NaN;
default:
Expand All @@ -177,7 +181,11 @@ public double metric(String name, long owningBucketOrd) {
case avg: return sums.get(owningBucketOrd) / counts.get(owningBucketOrd);
case sum_of_squares: return sumOfSqrs.get(owningBucketOrd);
case variance: return variance(owningBucketOrd);
case variance_population: return variancePopulation(owningBucketOrd);
case variance_sampling: return varianceSampling(owningBucketOrd);
case std_deviation: return Math.sqrt(variance(owningBucketOrd));
case std_deviation_population: return Math.sqrt(variance(owningBucketOrd));
case std_deviation_sampling: return Math.sqrt(varianceSampling(owningBucketOrd));
case std_upper:
return (sums.get(owningBucketOrd) / counts.get(owningBucketOrd)) + (Math.sqrt(variance(owningBucketOrd)) * this.sigma);
case std_lower:
Expand All @@ -188,12 +196,23 @@ public double metric(String name, long owningBucketOrd) {
}

private double variance(long owningBucketOrd) {
return variancePopulation(owningBucketOrd);
}

private double variancePopulation(long owningBucketOrd) {
double sum = sums.get(owningBucketOrd);
long count = counts.get(owningBucketOrd);
double variance = (sumOfSqrs.get(owningBucketOrd) - ((sum * sum) / count)) / count;
return variance < 0 ? 0 : variance;
}

private double varianceSampling(long owningBucketOrd) {
double sum = sums.get(owningBucketOrd);
long count = counts.get(owningBucketOrd);
double variance = (sumOfSqrs.get(owningBucketOrd) - ((sum * sum) / count)) / (count - 1);
return variance < 0 ? 0 : variance;
}

@Override
public InternalAggregation buildAggregation(long bucket) {
if (valuesSource == null || bucket >= counts.size()) {
Expand Down
Loading

0 comments on commit 42e2308

Please sign in to comment.