[ML] Use random sampler for aggregations for Data Visualizer document count chart #136124

qn895 · 2022-07-11T16:43:12Z

Describe the feature:

Currently, in ML's Data visualizer, we are not using any sampling when making aggregations for the document count chart. For large indices, it would benefit greatly for us to enable the random sampler agg when appropriate. To enable the new sampling method, it needs to:

Account whether the random sampler would be appropriate to use (e.g. only use it with indices/queries with more than 1 million hits only) to ensure a sufficient sampled size.
Account for speed improvement over vanilla aggregation without sampling (e.g. to opt for vanilla aggregation for queries with less than 10 million docs).
Clearly indicate that the total document count as well as the chart itself is approximate if random sampler is used
Reconcile the difference in the populated %. Currently we use the total hit count/total document count to calculate the % of docs in which a field is populated.

Proposed approach:

First make a query with a low default probability of 0.0001 - from this initial result (which averages around 120ms), find the estimate number of total docs and calculate the next appropriate probability.
If estimated number of total docs < 10 million docs*, then use probability of 1 (which is to not use sampling at all)
If estimated number of total docs >= 10 million docs*, then use the calculated closest probability. Then show this value in the probability slider and visually indicate that we are indeed using random sampling.

elasticmachine · 2022-07-11T16:43:14Z

Pinging @elastic/ml-ui (:ml)

qn895 · 2022-07-27T15:22:52Z

Closing via #136150

qn895 added :ml Feature:File and Index Data Viz ML file and index data visualizer v8.4.0 labels Jul 11, 2022

qn895 self-assigned this Jul 11, 2022

qn895 changed the title ~~[ML] Use random sampler for aggregations for Data visualizer document count chart~~ [ML] Use random sampler for aggregations for Data Visualizer document count chart Jul 11, 2022

qn895 mentioned this issue Jul 11, 2022

[ML] Add random sampler to Data visualizer document count chart #136150

Merged

1 task

qn895 closed this as completed Jul 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Use random sampler for aggregations for Data Visualizer document count chart #136124

[ML] Use random sampler for aggregations for Data Visualizer document count chart #136124

qn895 commented Jul 11, 2022

elasticmachine commented Jul 11, 2022

qn895 commented Jul 27, 2022

[ML] Use random sampler for aggregations for Data Visualizer document count chart #136124

[ML] Use random sampler for aggregations for Data Visualizer document count chart #136124

Comments

qn895 commented Jul 11, 2022

elasticmachine commented Jul 11, 2022

qn895 commented Jul 27, 2022