Skip to content

Commit

Permalink
Updates anomaly detection job terminology in Stack Overview (#444)
Browse files Browse the repository at this point in the history
  • Loading branch information
lcawl authored Jul 29, 2019
1 parent 41ff5af commit 141b963
Show file tree
Hide file tree
Showing 9 changed files with 61 additions and 60 deletions.
6 changes: 3 additions & 3 deletions docs/en/stack/ml/anomaly-detection/api-quickref.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@
[[ml-api-quickref]]
== API quick reference

All {ml} endpoints have the following base:
All {ml} {anomaly-detect} endpoints have the following base:

[source,js]
----
/_ml/
----
// NOTCONSOLE

The main {ml} resources can be accessed with a variety of endpoints:
The main resources can be accessed with a variety of endpoints:

* {ref}/ml-apis.html#ml-api-job-endpoint[+/anomaly_detectors/+]: Create and manage {anomaly-jobs}
* {ref}/ml-apis.html#ml-api-calendar-endpoint[+/calendars/+]: Create and manage calendars and scheduled events
Expand All @@ -19,4 +19,4 @@ The main {ml} resources can be accessed with a variety of endpoints:
* {ref}/ml-apis.html#ml-api-result-endpoint[+/results/+]: Access the results of an {anomaly-job}
* {ref}/ml-apis.html#ml-api-snapshot-endpoint[+/model_snapshots/+]: Manage model snapshots

For a full list, see {ref}/ml-apis.html[Machine learning APIs].
For a full list, see {ref}/ml-apis.html[{ml-cap} {anomaly-detect} APIs].
24 changes: 11 additions & 13 deletions docs/en/stack/ml/anomaly-detection/buckets.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -8,20 +8,18 @@
The {ml-features} use the concept of a _bucket_ to divide the time series
into batches for processing.

The _bucket span_ is part of the configuration information for a job. It defines
the time interval that is used to summarize and model the data. This is
typically between 5 minutes to 1 hour and it depends on your data characteristics.
When you set the bucket span, take into account the granularity at which you
want to analyze, the frequency of the input data, the typical duration of the
anomalies, and the frequency at which alerting is required.
The _bucket span_ is part of the configuration information for an {anomaly-job}.
It defines the time interval that is used to summarize and model the data. This
is typically between 5 minutes to 1 hour and it depends on your data
characteristics. When you set the bucket span, take into account the granularity
at which you want to analyze, the frequency of the input data, the typical
duration of the anomalies, and the frequency at which alerting is required.

When you view your {ml} results, each bucket has an anomaly score. This score is
a statistically aggregated and normalized view of the combined anomalousness of
all the record results in the bucket.

In 6.5 and later releases, the {ml} analytics enhance the anomaly score for each
bucket by considering
//TBD: preceding?
The {ml} analytics enhance the anomaly score for each bucket by considering
contiguous buckets. This extra _multi-bucket analysis_ effectively uses a
sliding window to evaluate the events in each bucket relative to the larger
context of recent events. When you review your {ml} results, there is a
Expand All @@ -37,9 +35,9 @@ In this example, you can see that some of the anomalies fall within the shaded
blue area, which represents the bounds for the expected values. The bounds are
calculated per bucket, but multi-bucket analysis is not limited by that scope.

If you have more than one job, you can
also obtain overall bucket results, which combine and correlate anomalies from
multiple jobs into an overall score. When you view the results for job groups
in {kib}, it provides the overall bucket scores. For more information, see
If you have more than one {anomaly-job}, you can also obtain overall bucket
results, which combine and correlate anomalies from multiple jobs into an
overall score. When you view the results for job groups in {kib}, it provides
the overall bucket scores. For more information, see
{ref}/ml-results-resource.html[Results resources] and
{ref}/ml-get-overall-buckets.html[Get overall buckets API].
15 changes: 8 additions & 7 deletions docs/en/stack/ml/anomaly-detection/calendars.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,18 @@ identify these events in advance, no anomalies are generated during that period.
The {ml} model is not ill-affected and you do not receive spurious results.

You can create calendars and scheduled events in the **Settings** pane on the
**Machine Learning** page in {kib} or by using {ref}/ml-apis.html[{ml} APIs].
**Machine Learning** page in {kib} or by using
{ref}/ml-apis.html[{ml-cap} {anomaly-detect} APIs].

A scheduled event must have a start time, end time, and description. In general,
scheduled events are short in duration (typically lasting from a few hours to a
day) and occur infrequently. If you have regularly occurring events, such as
weekly maintenance periods, you do not need to create scheduled events for these
circumstances; they are already handled by the {ml} analytics.

You can identify zero or more scheduled events in a calendar. Jobs can then
subscribe to calendars and the {ml} analytics handle all subsequent scheduled
events appropriately.
You can identify zero or more scheduled events in a calendar. {anomaly-jobs-cap}
can then subscribe to calendars and the {ml} analytics handle all subsequent
scheduled events appropriately.

If you want to add multiple scheduled events at once, you can import an
iCalendar (`.ics`) file in {kib} or a JSON file in the
Expand All @@ -27,13 +28,13 @@ iCalendar (`.ics`) file in {kib} or a JSON file in the
[NOTE]
--

* You must identify scheduled events before your job analyzes the data for that
time period. Machine learning results are not updated retroactively.
* You must identify scheduled events before your {anomaly-job} analyzes the data
for that time period. Machine learning results are not updated retroactively.
* If your iCalendar file contains recurring events, only the first occurrence is
imported.
* Bucket results are generated during scheduled events but they have an
anomaly score of zero. For more information about bucket results, see
{ref}/ml-results-resource.html[Results Resources].
{ref}/ml-results-resource.html[Results resources].
* If you use long or frequent scheduled events, it might take longer for the
{ml} analytics to learn to model your data and some anomalous behavior might be
missed.
Expand Down
14 changes: 7 additions & 7 deletions docs/en/stack/ml/anomaly-detection/datafeeds.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,18 @@
[[ml-dfeeds]]
=== {dfeeds-cap}

Machine learning jobs can analyze data that is stored in {es} or data that is
{anomaly-jobs-cap} can analyze data that is stored in {es} or data that is
sent from some other source via an API. _{dfeeds-cap}_ retrieve data from {es}
for analysis, which is the simpler and more common scenario.

If you create jobs in {kib}, you must use {dfeeds}. When you create a job, you
select an index pattern and {kib} configures the {dfeed} for you under the
covers. If you use {ml} APIs instead, you can create a {dfeed} by using the
{ref}/ml-put-datafeed.html[create {dfeeds} API] after you create a job. You can
associate only one {dfeed} with each job.
If you create {anomaly-jobs} in {kib}, you must use {dfeeds}. When you create a
job, you select an index pattern and {kib} configures the {dfeed} for you under
the covers. If you use APIs instead, you can create a {dfeed} by using the
{ref}/ml-put-datafeed.html[create {dfeeds} API] after you create an
{anomaly-job}. You can associate only one {dfeed} with each job.

For a description of all the {dfeed} properties, see
{ref}/ml-datafeed-resource.html[Datafeed Resources].
{ref}/ml-datafeed-resource.html[Datafeed resources].

To start retrieving data from {es}, you must start the {dfeed}. When you start
it, you can optionally specify start and end times. If you do not specify an
Expand Down
20 changes: 10 additions & 10 deletions docs/en/stack/ml/anomaly-detection/forecasting.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ your disk utilization will reach 100% before the end of next week.

Each forecast has a unique ID, which you can use to distinguish between forecasts
that you created at different times. You can create a forecast by using the
{ref}/ml-forecast.html[Forecast Jobs API] or by using {kib}. For example:
{ref}/ml-forecast.html[forecast {anomaly-jobs} API] or by using {kib}. For
example:


[role="screenshot"]
Expand All @@ -34,19 +35,17 @@ Eventually if the confidence levels are too low, the forecast stops.
You can also optionally specify when the forecast expires. By default, it
expires in 14 days and is deleted automatically thereafter. You can specify a
different expiration period by using the `expires_in` parameter in the
{ref}/ml-forecast.html[Forecast Jobs API].

//Add examples of forecast_request_stats and forecast documents?
{ref}/ml-forecast.html[forecast {anomaly-jobs} API].

There are some limitations that affect your ability to create a forecast:

* You can generate only three forecasts concurrently. There is no limit to the
number of forecasts that you retain. Existing forecasts are not overwritten when
you create new forecasts. Rather, they are automatically deleted when they expire.
* If you use an `over_field_name` property in your job (that is to say, it's a
_population job_), you cannot create a forecast.
* If you use any of the following analytical functions in your job, you
cannot create a forecast:
* If you use an `over_field_name` property in your {anomaly-job} (that is to say,
it's a _population job_), you cannot create a forecast.
* If you use any of the following analytical functions in your {anomaly-job},
you cannot create a forecast:
** `lat_long`
** `rare` and `freq_rare`
** `time_of_day` and `time_of_week`
Expand All @@ -56,9 +55,10 @@ For more information about any of these functions, see <<ml-functions>>.
--
* Forecasts run concurrently with real-time {ml} analysis. That is to say, {ml}
analysis does not stop while forecasts are generated. Forecasts can have an
impact on {ml} jobs, however, especially in terms of memory usage. For this
impact on {anomaly-jobs}, however, especially in terms of memory usage. For this
reason, forecasts run only if the model memory status is acceptable.
* The job must be open when you create a forecast. Otherwise, an error occurs.
* The {anomaly-job} must be open when you create a forecast. Otherwise, an error
occurs.
* If there is insufficient data to generate any meaningful predictions, an
error occurs. In general, forecasts that are created early in the learning phase
of the data analysis are less accurate.
12 changes: 6 additions & 6 deletions docs/en/stack/ml/anomaly-detection/jobs.asciidoc
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
[role="xpack"]
[[ml-jobs]]
=== Machine learning jobs
=== {anomaly-jobs-cap}
++++
<titleabbrev>Jobs</titleabbrev>
++++

Machine learning jobs contain the configuration information and metadata
{anomaly-jobs-cap} contain the configuration information and metadata
necessary to perform an analytics task.

Each job has one or more _detectors_. A detector applies an analytical function
to specific fields in your data. For more information about the types of
analysis you can perform, see <<ml-functions>>.
Each {anomaly-job} has one or more _detectors_. A detector applies an analytical
function to specific fields in your data. For more information about the types
of analysis you can perform, see <<ml-functions>>.

A job can also contain properties that affect which types of entities or events
are considered anomalous. For example, you can specify whether entities are
Expand All @@ -20,7 +20,7 @@ categories and partitions. Some of these more advanced job configurations
are described in the following section: <<ml-configuring>>.

For a description of all the job properties, see
{ref}/ml-job-resource.html[Job Resources].
{ref}/ml-job-resource.html[{anomaly-job-cap} resources].

In {kib}, there are wizards that help you create specific types of jobs, such
as _single metric_, _multi-metric_, and _population_ jobs. A single metric job
Expand Down
2 changes: 1 addition & 1 deletion docs/en/stack/ml/anomaly-detection/limitations.asciidoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[role="xpack"]
[[ml-limitations]]
== Machine learning limitations
== {ml-cap} {anomaly-detect} limitations
[subs="attributes"]
++++
<titleabbrev>Limitations</titleabbrev>
Expand Down
4 changes: 2 additions & 2 deletions docs/en/stack/ml/anomaly-detection/rules.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

By default, as described in <<ml-analyzing>>, anomaly detection is unsupervised
and the {ml} models have no awareness of the domain of your data. As a result,
{ml} jobs might identify events that are statistically significant but are
{anomaly-jobs} might identify events that are statistically significant but are
uninteresting when you know the larger context. Machine learning custom rules
enable you to customize anomaly detection.

Expand All @@ -22,7 +22,7 @@ for the rule, such that it applies only to certain machines. The scope is
defined by using {ml} filters.

_Filters_ contain a list of values that you can use to include or exclude events
from the {ml} analysis. You can use the same filter in multiple jobs.
from the {ml} analysis. You can use the same filter in multiple {anomaly-jobs}.

If you are analyzing web traffic, you might create a filter that contains a list
of IP addresses. For example, maybe they are IP addresses that you trust to
Expand Down
24 changes: 13 additions & 11 deletions docs/en/stack/ml/anomaly-detection/troubleshooting.asciidoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[role="xpack"]
[[ml-troubleshooting]]
== Troubleshooting {ml}
== Troubleshooting {ml} {anomaly-detect}
++++
<titleabbrev>Troubleshooting</titleabbrev>
++++
Expand Down Expand Up @@ -42,7 +42,7 @@ For more information, see {ref}/rolling-upgrades.html[Rolling upgrades].
[[ml-mappingclash]]
=== Job creation failure due to mapping clash

This problem occurs when you try to create a job.
This problem occurs when you try to create an {anomaly-job}.

*Symptoms:*

Expand All @@ -61,15 +61,16 @@ data types or different `fields` settings.

By default, {ml} results are stored in the `.ml-anomalies-shared` index in {es}.
To resolve this issue, click *Advanced > Use dedicated index* when you create
the job in {kib}. If you are using the create job API, specify an index name in
the `results_index_name` property.
the job in {kib}. If you are using the create {anomaly-job} job API, specify an
index name in the `results_index_name` property.

[[ml-jobnames]]
=== {kib} cannot display jobs with invalid characters in their name

This problem occurs when you create a job by using the
{ref}/ml-put-job.html[Create Jobs API] then try to view that job in {kib}. In
particular, the problem occurs when you use a period(.) in the job identifier.
This problem occurs when you create an {anomaly-job} by using the
{ref}/ml-put-job.html[Create {anomaly-jobs} API] then try to view that job in
{kib}. In particular, the problem occurs when you use a period(.) in the job
identifier.

*Symptoms:*

Expand All @@ -82,10 +83,11 @@ abbreviated name, it is displayed.

*Resolution:*

Create jobs in {kib} or ensure that you create jobs with valid identifiers when
you use the {ml} APIs. For more information about valid identifiers, see
{ref}/ml-put-job.html[Create Jobs API] or
{ref}/ml-job-resource.html[Job Resources].
Create {anomaly-jobs} in {kib} or ensure that you create {anomaly-jobs} with
valid identifiers when you use the APIs. For more information about valid
identifiers, see
{ref}/ml-put-job.html[Create {anomaly-jobs} API] or
{ref}/ml-job-resource.html[{anomaly-detect-cap} job resources].

[[ml-upgradedf]]

Expand Down

0 comments on commit 141b963

Please sign in to comment.