You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, in ML's Data visualizer, we are not using any sampling when making aggregations for the document count chart. For large indices, it would benefit greatly for us to enable the random sampler agg when appropriate. To enable the new sampling method, it needs to:
Account whether the random sampler would be appropriate to use (e.g. only use it with indices/queries with more than 1 million hits only) to ensure a sufficient sampled size.
Account for speed improvement over vanilla aggregation without sampling (e.g. to opt for vanilla aggregation for queries with less than 10 million docs).
Clearly indicate that the total document count as well as the chart itself is approximate if random sampler is used
Reconcile the difference in the populated %. Currently we use the total hit count/total document count to calculate the % of docs in which a field is populated.
Proposed approach:
First make a query with a low default probability of 0.0001 - from this initial result (which averages around 120ms), find the estimate number of total docs and calculate the next appropriate probability.
If estimated number of total docs < 10 million docs*, then use probability of 1 (which is to not use sampling at all)
If estimated number of total docs >= 10 million docs*, then use the calculated closest probability. Then show this value in the probability slider and visually indicate that we are indeed using random sampling.
The text was updated successfully, but these errors were encountered:
qn895
changed the title
[ML] Use random sampler for aggregations for Data visualizer document count chart
[ML] Use random sampler for aggregations for Data Visualizer document count chart
Jul 11, 2022
Describe the feature:
Currently, in ML's Data visualizer, we are not using any sampling when making aggregations for the document count chart. For large indices, it would benefit greatly for us to enable the random sampler agg when appropriate. To enable the new sampling method, it needs to:
Proposed approach:
The text was updated successfully, but these errors were encountered: