Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add standard deviation / variance sampling to extended stats aggregation #49554

Closed
costin opened this issue Nov 25, 2019 · 5 comments · Fixed by #49782
Closed

Add standard deviation / variance sampling to extended stats aggregation #49554

costin opened this issue Nov 25, 2019 · 5 comments · Fixed by #49782
Labels
:Analytics/Aggregations Aggregations >enhancement good first issue low hanging fruit Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@costin
Copy link
Member

costin commented Nov 25, 2019

Currently Elasticsearch offers standard deviation (STDDEV) and variance (VAR) both in population form however there's also the sampling form which depending on the data size, can yield significantly different results.
As it's just a matter of a (somewhat) different formula, it should be straight forward to expand the current implementation Extended Stats to support this variant as well.

Potentially to avoid any ambiguities going forward, the current std_deviation could be aliased to std_deviation_population (same for variance) so one could easily pick up the desired type and while also being clear about what type the default fields are.

The improved response can look something like this:

{
    ...

    "aggregations": {
        "grades_stats": {
           "count": 2,
           "min": 50.0,
           "max": 100.0,
           "avg": 75.0,
           "sum": 150.0,
           "sum_of_squares": 12500.0,
           "variance": 625.0,
           "variance_population": 625.0,  // same as "variance"
           "variance_sampling" : ...
           "std_deviation": 25.0,
           "std_deviation_population": 25.0, // same as std_deviation
           "std_deviation_sampling": ...,
           "std_deviation_bounds": {
            "upper": 125.0,
            "lower": 25.0
           },
        }
    }
}
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@hoonti06
Copy link

I am a beginner. Can I start working on this?

@polyfractal
Copy link
Contributor

Hi @hoonti06, sure! Let me know if you have questions or need some guidance :)

@shellfish1
Copy link

Is anyone working on this currently ?

@andrewjohnson2
Copy link
Contributor

Hi! I added a pull request for this issue.

@rjernst rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
imotov added a commit that referenced this issue Jun 10, 2020
Per 49554 I added standard deviation sampling and variance sampling to the extended stats interface.

Closes #49554

Co-authored-by: Igor Motov <igor@motovs.org>
imotov added a commit to imotov/elasticsearch that referenced this issue Jun 10, 2020
…ic#49782)

Per 49554 I added standard deviation sampling and variance sampling to the extended stats interface.

Closes elastic#49554

Co-authored-by: Igor Motov <igor@motovs.org>
imotov added a commit that referenced this issue Jun 11, 2020
… (#57947)

Per 49554 I added standard deviation sampling and variance sampling to the extended stats interface.
 
Closes #49554

Co-authored-by: Igor Motov <igor@motovs.org>

Co-authored-by: andrewjohnson2 <aj114114@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement good first issue low hanging fruit Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants