Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Add support for per-partition categorization jobs #74592

Closed
wants to merge 36 commits into from

Conversation

qn895
Copy link
Member

@qn895 qn895 commented Aug 6, 2020

Summary

Part of meta issue #73968 which adds the ability check for per-partition in categorization job creation wizard.

UI-related changes

  • Categorization job creation wizard
    • Navigating to next screen and back should persist the settings
    • Switching to a different type of detector after setting the per-partition field name should persist the partition_field_name in the detector config
    • Turning off per_partition should also set stop_on_warn to false

Screen Shot 2020-08-06 at 3 12 11 PM

  • Advanced job creation wizard (when a category field is selected)
    2020-08-06 at 3 12 PM

  • If viewing the results of a per-partition categorization job in the Anomaly Explorer, and if stop_on_warn was enabled, add an indicator somewhere in the view if the categorization status goes to warn for a partition. This is needed to explain to the user that there may be fewer results than there could have been.

Screen Shot 2020-08-06 at 12 50 43 PM

API-related

  • Added api/ml/anomaly_detectors/{jobId}/categorizer_stats for retrieving categorizer_stats documents
  • Added api/ml/anomaly_detectors/{jobId}/stopped_partitions to find out which partitions we stopped categorizing (searching for categorizer_stats documents in the job's results index)
1. First get the job config and check if analysis_config.per_partition_categorization.stop_on_warn is true
2. If so, search for categorizer_stats documents for the current job where the categorization_status is warn
3. Return all the partition_field_value values from the documents found
If analysis_config.per_partition_categorization.stop_on_warn is false then we won’t have stopped categorizing anything, so you can return an empty list

Checklist

@@ -696,4 +699,166 @@ export function jobRoutes({ router, mlLicense }: RouteInitialization) {
}
})
);

/**
* @apiGroup AnomalyDetectors
Copy link
Member

@jgowdyelastic jgowdyelastic Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these endpoints are better suited to job_service, as this file only contains wrappers for the anomaly_detectors es endpoints.
there is a categorisation section inside here: https://github.com/elastic/kibana/tree/master/x-pack/plugins/ml/server/models/job_service/new_job/categorization
but as these are used for results, they might need a different folder, or maybe should live in results_service

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. Now that I see it, I think it's probably more suitable for results_service. Will update it to be that group instead. Thanks Jame!

qn895 added 4 commits August 9, 2020 21:35
…r-partition

# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
},
});
}
const results: SearchResponse<AnomalyCategorizerStatsDoc> = await callAsCurrentUser('search', {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all searches to ML_RESULTS_INDEX_PATTERN should be performed by the internal user.

@qn895 qn895 marked this pull request as ready for review August 10, 2020 22:41
@qn895 qn895 requested a review from a team as a code owner August 10, 2020 22:41
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

Copy link
Member

@pheyos pheyos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding API tests as part of this PR! 🎉
Left a few comments.

qn895 added 2 commits August 11, 2020 11:39
…r-partition

# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
@qn895
Copy link
Member Author

qn895 commented Aug 11, 2020

Updated PR with the latest changes and I think it's ready for a final look @jgowdyelastic, @alvarezmelissa87, @pheyos 🙏

{stoppedPartitions && (
<EuiCallOut
size={'s'}
title={i18n.translate('xpack.ml.explorer.stoppedPartitionsExistCallout', {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's recommended to use { FormattedMessage } from '@kbn/i18n/react'; inside of components instead of i18n.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated all the i18n.translate to using FormattedMessage here 3686b73

@@ -57,6 +57,7 @@ export class JobCreator {
private _stopAllRefreshPolls: {
stop: boolean;
} = { stop: false };
private _partitionField: string | null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use

Suggested change
private _partitionField: string | null;
private _partitionField: string | null = null;

and there is no need to defined it in the constructor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with James, we decided to remove this _partitionField altogether here 0a813e1

@@ -21,6 +27,13 @@ export const ExtraSettings: FC = () => {
<SummaryCountField />
</EuiFlexItem>
</EuiFlexGroup>
{showCategorizationPerPartitionField && (
<EuiFlexGroup gutterSize="xl">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need a flex group just for one item?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed here a94ecaf

CategorizationJobCreator,
isCategorizationJobCreator,
} from '../../../../../common/job_creator';
import { newJobCapsService } from '../../../../../../../services/new_job_capabilities_service';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid using static imports for services. Put them in a context instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree this would be better in a context, however newJobCapsService is used throughout the wizards and so i'd suggest this happens in a refactoring PR later on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll save this issue for a follow up PR like James mentioned

Comment on lines 26 to 42
const options: EuiComboBoxOptionOption[] = [
...createFieldOptions(fields, jobCreator.additionalFields),
];

const selection: EuiComboBoxOptionOption[] = [];
if (selectedField !== null) {
selection.push({ label: selectedField });
}

function onChange(selectedOptions: EuiComboBoxOptionOption[]) {
const option = selectedOptions[0];
if (typeof option !== 'undefined') {
changeHandler(option.label);
} else {
changeHandler(null);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It all gets invoked on each render. Please use hooks accordingly (useMemo and useCallback)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here a94ecaf

}
} catch (error) {
// eslint-disable-next-line no-console
console.error(error);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to fail silently here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for this scenario it's okay to fail silently since it's only there to give a warning/info message if one of the partition has stopped. But if we think it's important for the user to know, I can add an error message.

});

if (!jobConfigResponse || jobConfigResponse.jobs.length < 1) {
throw Error(`Unable to find anomaly detector jobs ${jobIds.join(', ')}`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we use throw Boom.notFound instead? to return an error with appropriate code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here a94ecaf

* @api {get} /api/ml/results/:jobId/categorizer_stats
* @apiName GetStoppedPartitions
* @apiDescription Returns list of partitions we stopped categorizing whens status changed to warn
* @apiSchema (params) jobIdSchema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be

Suggested change
* @apiSchema (params) jobIdSchema
* @apiSchema (body) getStoppedPartitionsSchema

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here a94ecaf

/**
* @apiGroup ResultsService
*
* @api {get} /api/ml/results/:jobId/categorizer_stats
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be

Suggested change
* @api {get} /api/ml/results/:jobId/categorizer_stats
* @api {post} /api/ml/results/stopped_partitions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here a94ecaf

/**
* @apiGroup ResultsService
*
* @api {get} /api/ml/results/:jobId/categorizer_stats
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It requires some brief description for docs to be properly generated, please check other examples

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here a94ecaf

this._job_config.analysis_config.per_partition_categorization!.enabled = enabled;
}

public get partitionStopOnWarn() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this name be perPartitionStopOnWarn ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here a94ecaf

@@ -81,6 +82,7 @@ export class JobCreator {
}

this._datafeed_config.query = query;
this._partitionField = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping a separate copy of the partition field name will lead to this becoming out of sync with the real partition field name in the job config, if the user edits the job's JSON.
i think we should be using the partition field from the detector which uses the mlcategory field.

this will make the get categorizationPerPartitionField() function trickier as it'll have to find that field each time it is called.
Perhaps a workaround would be to store the index of the categorisation detector in this class for fast look up. in the categorisation wizard that will always be 0.

I don't see how we can deal with multiple categorisation detectors with differing partition fields in the UI. Even though it is possible to configure in the JSON.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As from our conversation, I have removed this._partitionField altogether. Since storing the index might lead to the same issue where the index being out of sync with the detectors (e.g. like deleting one of the detector), I have changed the getter function to find the first instance of detector where the keyword mlcategory exists. Since the partition_field_name property has to be the same value in every detector that uses the keyword mlcategory, I think this change is more suitable.

import { EuiDescribedFormGroup } from '@elastic/eui';

interface Props {
isOptional: boolean;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isOptional isn't needed for this component

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in a94ecaf

CategorizationJobCreator,
isCategorizationJobCreator,
} from '../../../../../common/job_creator';
import { newJobCapsService } from '../../../../../../../services/new_job_capabilities_service';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree this would be better in a context, however newJobCapsService is used throughout the wizards and so i'd suggest this happens in a refactoring PR later on.

jobCreator.perPartitionCategorization &&
jobCreator.categorizationPerPartitionField === null
) {
jobCreator.categorizationPerPartitionField = filteredCategories[0].id;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to be safe, it would be worth adding a check for filteredCategories.length before access it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in a94ecaf

jobIds,
fieldToBucket,
});
return httpService.http<any>({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this returns a GetStoppedPartitionResult
That interface be moved to a common location and used server side and here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in a94ecaf

@@ -71,6 +76,19 @@ function getPartitionFieldsValues(legacyClient: ILegacyScopedClusterClient, payl
return rs.getPartitionFieldsValues(jobId, searchTerm, criteriaFields, earliestMs, latestMs);
}

function getCategorizerStats(context: RequestHandlerContext, params: any, query: any) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs updating to use legacyClient

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in a94ecaf

darnautov and others added 8 commits August 12, 2020 15:02
// looping through to find current partition_field name to prevent stale/syncing issue
// possible because partition_field_name has to have same value in every detector that uses the keyword mlcategory
const firstCategorizationDetector = this._detectors.find(
(d) => d.by_field_name === 'mlcategory'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a common constant for this string MLCATEGORY in field_types

return null;
}

public set categorizationPerPartitionField(fieldName: string | null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should only be changing detectors which are using mlcategory as the by_field_name

);
if (
firstCategorizationDetector &&
'partition_field_name' in firstCategorizationDetector &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this check is needed as the undefined below has it covered.

delete detector.partition_field_name;
});
} else {
if (this.categorizationPerPartitionField !== fieldName) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking about this a bit more, this check won't catch the situation where a second mlcategory detector has been added with a different partition field.
I think we're going to struggle to add nice UI behaviour in this situation without looping over the detectors whenever one is added to check for mlcategory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested this and validation fails in a bad way when we have two categorisation detectors with different partition fields.
I think we should allow this to be configured in the UI, and we should fix validation so it displays the error in the correct way.

Copy link
Member

@jgowdyelastic jgowdyelastic Aug 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After chatting to @droberts195 we agree that the advanced job wizard shouldn't have a global way to configure the partition field for all categorisation detectors.
Instead the user should do this themselves per detector.
This means the categorizationPerPartitionField can be moved out of the base JobCreator class and instead live in the derived CategorizationJobCreator class.
It also means we can put the partitionField variable back in as there will only ever be one detector in that job.

Copy link
Member

@pheyos pheyos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functional tests LGTM

darnautov and others added 8 commits August 13, 2020 14:37
…r-partition

# Conflicts:
#	docs/development/plugins/data/public/kibana-plugin-plugins-data-public.searchinterceptor.md
#	src/plugins/data/server/index.ts
#	src/plugins/data/server/server.api.md
#	x-pack/plugins/reporting/server/routes/jobs.ts
#	x-pack/plugins/security_solution/public/resolver/test_utilities/simulator/index.tsx
#	x-pack/plugins/security_solution/public/resolver/view/clickthrough.test.tsx
#	x-pack/plugins/security_solution/public/resolver/view/panel.test.tsx
#	x-pack/test/functional/apps/reporting_management/report_listing.ts
@kibanamachine
Copy link
Contributor

kibanamachine commented Aug 14, 2020

💔 Build Failed

Failed CI Steps

Build metrics

@kbn/optimizer bundle module count

id value diff baseline
maps 692 -1 693
ml 1179 -176 1355
total -177

async chunks size

id value diff baseline
maps 3.3MB +21.2KB 3.3MB
ml 7.9MB -71.2KB 8.0MB
securitySolution 7.2MB -1.0KB 7.2MB
visTypeVega 1.4MB -278.0B 1.4MB
total -51.3KB

page load bundle size

id value diff baseline
data 1.4MB -1.7KB 1.4MB
dataEnhanced 177.9KB -111.0B 178.0KB
lens 843.9KB -14.0KB 858.0KB
ml 523.2KB -50.0KB 573.2KB
securitySolution 805.8KB -72.0B 805.9KB
visTypeVega 661.1KB -220.0B 661.3KB
total -66.1KB

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@qn895
Copy link
Member Author

qn895 commented Aug 14, 2020

Seems like I have accidentally pulled Dima's PR in as well with my updates yesterday. Will close this PR and create a new one. My apologies for the convenience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants