Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Switch from normal sampling to random sampler for Index data visualizer table #144646

Merged
merged 22 commits into from
Nov 16, 2022
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
7857ec2
Change to random sampler
qn895 Nov 4, 2022
086ac92
Fix time field
qn895 Aug 16, 2022
5894931
Clean up, update functional tests
qn895 Nov 4, 2022
20edcf2
Tweak heading of column
qn895 Nov 4, 2022
454597b
Merge remote-tracking branch 'upstream/main' into ml-dv-random-sample…
qn895 Nov 4, 2022
9c5dcfd
Merge remote-tracking branch 'upstream/main' into ml-dv-random-sample…
qn895 Nov 9, 2022
0a6d366
Fix count & percentage, add loading indicating when sampling prob cha…
qn895 Nov 9, 2022
2a9f150
Add comment to num sampled, fix translation
qn895 Nov 9, 2022
9358d01
Match count for discover/lens and data viz
qn895 Nov 10, 2022
1556606
Match count for discover/lens and data viz
qn895 Nov 10, 2022
b1fb6ec
Clean up, i18n, types
qn895 Nov 10, 2022
c3c626d
Update tests
qn895 Nov 13, 2022
214482c
Update tests data view management
qn895 Nov 13, 2022
16bd9ef
Merge remote-tracking branch 'upstream/main' into ml-dv-random-sample…
qn895 Nov 13, 2022
97d46f6
Delete unused file, remove todos
qn895 Nov 14, 2022
cb33dd2
Merge remote-tracking branch 'upstream/main' into ml-dv-random-sample…
qn895 Nov 14, 2022
80872e0
Fix percentage message if undefined, move types,
qn895 Nov 14, 2022
33c645f
Move gear to the left, change debounce to 100ms
qn895 Nov 14, 2022
65902a4
Merge remote-tracking branch 'upstream/main' into ml-dv-random-sample…
qn895 Nov 14, 2022
1d5fe9b
Move cog icon to right, refactor onChange
qn895 Nov 15, 2022
7d7ba42
Fix count for file data visualizer
qn895 Nov 15, 2022
c24049c
Merge branch 'main' into ml-dv-random-sampler-part-2
kibanamachine Nov 15, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,24 @@ import { useDiscoverServices } from '../../../../hooks/use_discover_services';
import { FIELD_STATISTICS_LOADED } from './constants';
import type { GetStateReturn } from '../../services/discover_state';
import { AvailableFields$, DataRefetch$, DataTotalHits$ } from '../../hooks/use_saved_search';
export interface RandomSamplingOption {
mode: 'random_sampling';
seed: string;
probability: number;
}

export interface NormalSamplingOption {
mode: 'normal_sampling';
seed: string;
shardSize: number;
}

export interface NoSamplingOption {
mode: 'no_sampling';
seed: string;
}

export type SamplingOption = RandomSamplingOption | NormalSamplingOption | NoSamplingOption;

export interface DataVisualizerGridEmbeddableInput extends EmbeddableInput {
dataView: DataView;
Expand All @@ -39,6 +57,7 @@ export interface DataVisualizerGridEmbeddableInput extends EmbeddableInput {
sessionId?: string;
fieldsToFetch?: string[];
totalDocuments?: number;
samplingOption?: SamplingOption;
}
export interface DataVisualizerGridEmbeddableOutput extends EmbeddableOutput {
showDistributions?: boolean;
Expand Down Expand Up @@ -163,6 +182,11 @@ export const FieldStatisticsTable = (props: FieldStatisticsTableProps) => {
totalDocuments: savedSearchDataTotalHits$
? savedSearchDataTotalHits$.getValue()?.result
: undefined,
samplingOption: {
mode: 'normal_sampling',
shardSize: 5000,
seed: searchSessionId,
} as NormalSamplingOption,
});
embeddable.reload();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,10 @@
* Otherwise you'd just satisfy TS requirements but might still
* run into runtime issues.
*/
export const isPopulatedObject = <U extends string = string>(
export const isPopulatedObject = <U extends string = string, T extends unknown = unknown>(
arg: unknown,
requiredAttributes: U[] = []
): arg is Record<U, unknown> => {
): arg is Record<U, T> => {
return (
typeof arg === 'object' &&
arg !== null &&
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,9 +64,7 @@ export interface FieldVisStats {
max?: number;
median?: number;
min?: number;
topValues?: Array<{ key: number | string; doc_count: number }>;
topValuesSampleSize?: number;
topValuesSamplerShardSize?: number;
topValues?: Array<{ key: number | string; doc_count: number; percent: number }>;
examples?: Array<string | GeoPointExample | object>;
timeRangeEarliest?: number;
timeRangeLatest?: number;
Expand Down
46 changes: 44 additions & 2 deletions x-pack/plugins/data_visualizer/common/types/field_stats.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,25 @@ import { IKibanaSearchResponse } from '@kbn/data-plugin/common';
import { isPopulatedObject } from '@kbn/ml-is-populated-object';
import { TimeBucketsInterval } from '../services/time_buckets';

export interface RandomSamplingOption {
mode: 'random_sampling';
seed: string;
probability: number;
}

export interface NormalSamplingOption {
mode: 'normal_sampling';
seed: string;
shardSize: number;
}

export interface NoSamplingOption {
mode: 'no_sampling';
seed: string;
}

export type SamplingOption = RandomSamplingOption | NormalSamplingOption | NoSamplingOption;

export interface FieldData {
fieldName: string;
existsInDocs: boolean;
Expand Down Expand Up @@ -54,7 +73,7 @@ export const isIKibanaSearchResponse = (arg: unknown): arg is IKibanaSearchRespo

export interface NumericFieldStats {
fieldName: string;
count: number;
count?: number;
min: number;
max: number;
avg: number;
Expand Down Expand Up @@ -86,7 +105,8 @@ export interface BooleanFieldStats {
count: number;
trueCount: number;
falseCount: number;
[key: string]: number | string;
topValues: Bucket[];
topValuesSampleSize: number;
}

export interface DocumentCountStats {
Expand Down Expand Up @@ -186,6 +206,9 @@ export interface FieldStatsCommonRequestParams {
intervalMs?: number;
query: estypes.QueryDslQueryContainer;
maxExamples?: number;
samplingProbability: number | null;
browserSessionSeed: number;
samplingOption: SamplingOption;
}

export interface OverallStatsSearchStrategyParams {
Expand All @@ -202,6 +225,8 @@ export interface OverallStatsSearchStrategyParams {
aggregatableFields: string[];
nonAggregatableFields: string[];
fieldsToFetch?: string[];
browserSessionSeed: number;
samplingOption: SamplingOption;
}

export interface FieldStatsSearchStrategyReturnBase {
Expand Down Expand Up @@ -238,3 +263,20 @@ export interface Field {
export interface Aggs {
[key: string]: estypes.AggregationsAggregationContainer;
}

export const EMBEDDABLE_SAMPLER_OPTION = {
RANDOM: 'random_sampling',
NORMAL: 'normal_sampling',
};
export type FieldStatsEmbeddableSamplerOption =
typeof EMBEDDABLE_SAMPLER_OPTION[keyof typeof EMBEDDABLE_SAMPLER_OPTION];

export function isRandomSamplingOption(arg: SamplingOption): arg is RandomSamplingOption {
return arg.mode === 'random_sampling';
}
export function isNormalSamplingOption(arg: SamplingOption): arg is NormalSamplingOption {
return arg.mode === 'normal_sampling';
}
export function isNoSamplingOption(arg: SamplingOption): arg is NoSamplingOption {
return arg.mode === 'no_sampling' || (arg.mode === 'random_sampling' && arg.probability === 1);
}
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ import {
EuiFormRow,
} from '@elastic/eui';
import { i18n } from '@kbn/i18n';
import { sortedIndex } from 'lodash';
import { debounce, sortedIndex } from 'lodash';
import { FormattedMessage } from '@kbn/i18n-react';
import { isDefined } from '../../util/is_defined';
import type { DocumentCountChartPoint } from './document_count_chart';
Expand Down Expand Up @@ -64,6 +64,16 @@ export const DocumentCountContent: FC<Props> = ({
setShowSamplingOptionsPopover(false);
}, [setShowSamplingOptionsPopover]);

// eslint-disable-next-line react-hooks/exhaustive-deps
const updateSamplingProbability = useCallback(
debounce((p) => {
if (setSamplingProbability) {
setSamplingProbability(p);
}
}, 100),
[setSamplingProbability]
);

const calloutInfoMessage = useMemo(() => {
switch (randomSamplerPreference) {
case RANDOM_SAMPLER_OPTION.OFF:
Expand Down Expand Up @@ -124,7 +134,6 @@ export const DocumentCountContent: FC<Props> = ({
return (
<>
<EuiFlexGroup alignItems="center" gutterSize="xs">
<TotalCountHeader totalCount={totalCount} approximate={approximate} loading={loading} />
<EuiFlexItem grow={false}>
<EuiPopover
data-test-subj="dvRandomSamplerOptionsPopover"
Expand Down Expand Up @@ -210,9 +219,7 @@ export const DocumentCountContent: FC<Props> = ({
? closestPrev
: closestNext;

if (setSamplingProbability) {
setSamplingProbability(closestProbability / 100);
}
updateSamplingProbability(closestProbability / 100);
Copy link
Member

@jgowdyelastic jgowdyelastic Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be more performant to put all of the logic in this onChange function inside the function that is wrapped in the debounce. There's no need to be calculating the closestProbability on every change if it's just going to be discarded.

Also there is a useful hook called useDebounce which might work well here.
You could put the e.currentTarget.value in a temporary state variable e.g. newProbability and then useDebounce could watch for changes in that variable.

}}
step={RANDOM_SAMPLER_STEP}
data-test-subj="dvRandomSamplerProbabilityRange"
Expand All @@ -226,6 +233,7 @@ export const DocumentCountContent: FC<Props> = ({
</EuiPopover>
<EuiFlexItem />
</EuiFlexItem>
<TotalCountHeader totalCount={totalCount} approximate={approximate} loading={loading} />
</EuiFlexGroup>
<DocumentCountChart
chartPoints={chartPoints}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ export const FieldsStatsGrid: FC<Props> = ({ results }) => {
pageState={dataVisualizerListState}
updatePageState={setDataVisualizerListState}
getItemIdToExpandedRowMap={getItemIdToExpandedRowMap}
overallStatsRunning={false}
/>
</div>
);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ import { EuiSpacer } from '@elastic/eui';
import { Axis, BarSeries, Chart, Settings, ScaleType } from '@elastic/charts';

import { FormattedMessage } from '@kbn/i18n-react';
import { i18n } from '@kbn/i18n';
import { TopValues } from '../../../top_values';
import type { FieldDataRowProps } from '../../types/field_data_row';
import { ExpandedRowFieldHeader } from '../expanded_row_field_header';
Expand Down Expand Up @@ -45,32 +44,13 @@ export const BooleanContent: FC<FieldDataRowProps> = ({ config, onAddFilter }) =
const theme = useDataVizChartTheme();
if (!formattedPercentages) return null;

const { trueCount, falseCount, count } = formattedPercentages;
const stats = {
...config.stats,
topValues: [
{
key: i18n.translate(
'xpack.dataVisualizer.dataGrid.fieldExpandedRow.booleanContent.trueCountLabel',
{ defaultMessage: 'true' }
),
doc_count: trueCount ?? 0,
},
{
key: i18n.translate(
'xpack.dataVisualizer.dataGrid.fieldExpandedRow.booleanContent.falseCountLabel',
{ defaultMessage: 'false' }
),
doc_count: falseCount ?? 0,
},
],
};
const { count } = formattedPercentages;
return (
<ExpandedRowContent dataTestSubj={'dataVisualizerBooleanContent'}>
<DocumentStatsTable config={config} />

<TopValues
stats={stats}
stats={config.stats}
fieldFormat={fieldFormat}
barColor="success"
onAddFilter={onAddFilter}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
*/

import React, { FC, useMemo } from 'react';
import { EuiSpacer, EuiText, htmlIdGenerator } from '@elastic/eui';
import { EuiText, htmlIdGenerator } from '@elastic/eui';
import { i18n } from '@kbn/i18n';
import { FormattedMessage } from '@kbn/i18n-react';
import {
Expand All @@ -18,6 +18,8 @@ import {
VectorLayerDescriptor,
} from '@kbn/maps-plugin/common';
import { EMSTermJoinConfig } from '@kbn/maps-plugin/public';
import { ES_FIELD_TYPES, KBN_FIELD_TYPES } from '@kbn/field-types';
import { useDataVisualizerKibana } from '../../../../../kibana_context';
import { EmbeddedMapComponent } from '../../../embedded_map';
import { FieldVisStats } from '../../../../../../../common/types';
import { ExpandedRowPanel } from './expanded_row_panel';
Expand Down Expand Up @@ -97,13 +99,55 @@ interface Props {
}

export const ChoroplethMap: FC<Props> = ({ stats, suggestion }) => {
const { fieldName, isTopValuesSampled, topValues, topValuesSamplerShardSize } = stats!;
const {
services: {
data: { fieldFormats },
},
} = useDataVisualizerKibana();

const { fieldName, isTopValuesSampled, topValues, sampleCount, totalDocuments } = stats!;

const layerList: VectorLayerDescriptor[] = useMemo(
() => [getChoroplethTopValuesLayer(fieldName || '', topValues || [], suggestion)],
[suggestion, fieldName, topValues]
);

const countsElement = totalDocuments ? (
<EuiText color="subdued" size="xs">
{isTopValuesSampled ? (
<FormattedMessage
id="xpack.dataVisualizer.dataGrid.fieldExpandedRow.choroplethMapTopValues.calculatedFromSampleRecordsLabel"
defaultMessage="Calculated from {sampledDocumentsFormatted} sample {sampledDocuments, plural, one {record} other {records}}."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These counts are also 0 in the file data viz:

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here 7d7ba42 (#144646)

values={{
sampledDocuments: sampleCount,
sampledDocumentsFormatted: (
<strong>
{fieldFormats
.getDefaultInstance(KBN_FIELD_TYPES.NUMBER, [ES_FIELD_TYPES.INTEGER])
.convert(sampleCount)}
</strong>
),
}}
/>
) : (
<FormattedMessage
id="xpack.dataVisualizer.dataGrid.fieldExpandedRow.choroplethMapTopValues.calculatedFromTotalRecordsLabel"
defaultMessage="Calculated from {totalDocumentsFormatted} {totalDocuments, plural, one {record} other {records}}."
values={{
totalDocuments,
totalDocumentsFormatted: (
<strong>
{fieldFormats
.getDefaultInstance(KBN_FIELD_TYPES.NUMBER, [ES_FIELD_TYPES.INTEGER])
.convert(totalDocuments ?? 0)}
</strong>
),
}}
/>
)}
</EuiText>
) : null;

return (
<ExpandedRowPanel
dataTestSubj={'fileDataVisualizerChoroplethMapTopValues'}
Expand All @@ -114,20 +158,7 @@ export const ChoroplethMap: FC<Props> = ({ stats, suggestion }) => {
<EmbeddedMapComponent layerList={layerList} />
</div>

{isTopValuesSampled === true && (
<div>
<EuiSpacer size={'s'} />
<EuiText size="xs" textAlign={'center'}>
<FormattedMessage
id="xpack.dataVisualizer.dataGrid.fieldExpandedRow.choroplethMapTopValues.calculatedFromSampleDescription"
defaultMessage="Calculated from sample of {topValuesSamplerShardSize} documents per shard"
values={{
topValuesSamplerShardSize,
}}
/>
</EuiText>
</div>
)}
{countsElement}
</ExpandedRowPanel>
);
};
Loading