From dbda7f4554a5992e8c6fbd7818f1b7610476bacb Mon Sep 17 00:00:00 2001 From: Tim Brooks Date: Mon, 25 Feb 2019 15:21:33 -0700 Subject: [PATCH 1/7] WIP --- docs/reference/ccr/remote-recovery.asciidoc | 38 +++++++++++++++++++++ 1 file changed, 38 insertions(+) create mode 100644 docs/reference/ccr/remote-recovery.asciidoc diff --git a/docs/reference/ccr/remote-recovery.asciidoc b/docs/reference/ccr/remote-recovery.asciidoc new file mode 100644 index 0000000000000..89f971ef7b91a --- /dev/null +++ b/docs/reference/ccr/remote-recovery.asciidoc @@ -0,0 +1,38 @@ +[[remote-recovery]] +=== Remote Recovery + +<> is the process used to build a new copy of a +shard on a follower node by copying data from the primary shard in the leader +cluster. {es} uses this remote recovery process to bootstrap a follower index +using the data from the leader index. + + + +rebuild shard copies that were +lost if a node has failed, and uses the same process when migrating a shard copy between nodes to rebalance the +cluster or to honor any changes to the <>. + +The following _expert_ setting can be set to manage the resources consumed by +remote recoveries: + +`ccr.indices.recovery.max_bytes_per_sec`:: + Limits the total inbound and outbound remote recovery traffic on each node. + Since this limit applies on each node, but there may be many nodes + performing remote recoveries concurrently, the total amount of peer recovery + traffic within a cluster may be much higher than this limit. If you set + this limit too high then there is a risk that ongoing peer recoveries will + consume an excess of bandwidth (or other resources) which could destabilize + the cluster. Defaults to `40mb`. + +`ccr.indices.recovery.max_concurrent_file_chunk`:: + Controls the number of file chunk requests that can be sent in parallel per recovery. + As multiple recoveries are already running in parallel (controlled by + cluster.routing.allocation.node_concurrent_recoveries), increasing this expert-level + setting might only help in situations where peer recovery of a single shard is not + reaching the total inbound and outbound peer recovery traffic as configured by + indices.recovery.max_bytes_per_sec, but is CPU-bound instead, typically when using + transport-level security or compression. Defaults to `2`. + +These settings can be dynamically updated on a live cluster with the +<> API. From 199501dbe53cd5de9ffd286e3354071db4fed5c2 Mon Sep 17 00:00:00 2001 From: Tim Brooks Date: Wed, 27 Feb 2019 15:27:45 -0700 Subject: [PATCH 2/7] Add more docs --- docs/reference/ccr/getting-started.asciidoc | 5 ++ docs/reference/ccr/remote-recovery.asciidoc | 63 ++++++++++++++------- 2 files changed, 46 insertions(+), 22 deletions(-) diff --git a/docs/reference/ccr/getting-started.asciidoc b/docs/reference/ccr/getting-started.asciidoc index 7c59b8628052f..26cc81db7fe93 100644 --- a/docs/reference/ccr/getting-started.asciidoc +++ b/docs/reference/ccr/getting-started.asciidoc @@ -253,6 +253,11 @@ PUT /server-metrics-copy/_ccr/follow?wait_for_active_shards=1 ////////////////////////// +The follower index is bootstrapped using the <> +process. The remote recovery process transfers the existing Lucene segment files +from the leader to the follower. When the remote recovery process is complete, +the index following will be initiated. + Now when you index documents into your leader index, you will see these documents replicated in the follower index. You can inspect the status of replication using the diff --git a/docs/reference/ccr/remote-recovery.asciidoc b/docs/reference/ccr/remote-recovery.asciidoc index 89f971ef7b91a..6a99270a747a1 100644 --- a/docs/reference/ccr/remote-recovery.asciidoc +++ b/docs/reference/ccr/remote-recovery.asciidoc @@ -1,17 +1,23 @@ [[remote-recovery]] === Remote Recovery -<> is the process used to build a new copy of a -shard on a follower node by copying data from the primary shard in the leader -cluster. {es} uses this remote recovery process to bootstrap a follower index -using the data from the leader index. - - - -rebuild shard copies that were -lost if a node has failed, and uses the same process when migrating a shard copy between nodes to rebalance the -cluster or to honor any changes to the <>. +Remote recovery is the process used to build a new copy of a shard on a follower +node by copying data from the primary shard in the leader cluster. {es} uses this +remote recovery process to bootstrap a follower index using the data from the +leader index. + +Remote recovery is a network intensive process that transfers all of the Lucene +segment files from the leader cluster to the follower cluster. The follower +requests that a recovery session be initiated on the primary shard in the leader +cluster. The follower then requests file chunks concurrently from the leader. By +default, the the process concurrently requests `5` large `1mb` file chunks as remote +recovery is designed to support leader and follower clusters with high network +latency between them. + +Information about an in-progress remote recovery can be obtained using the +<> api. Remote recoveries are implemented using the +<> infrastructure. This means that on-going +remote recoveries will be labelled as type `snapshot` in the recovery api. The following _expert_ setting can be set to manage the resources consumed by remote recoveries: @@ -19,20 +25,33 @@ remote recoveries: `ccr.indices.recovery.max_bytes_per_sec`:: Limits the total inbound and outbound remote recovery traffic on each node. Since this limit applies on each node, but there may be many nodes - performing remote recoveries concurrently, the total amount of peer recovery - traffic within a cluster may be much higher than this limit. If you set - this limit too high then there is a risk that ongoing peer recoveries will - consume an excess of bandwidth (or other resources) which could destabilize - the cluster. Defaults to `40mb`. + performing remote recoveries concurrently, the total amount of remote recovery bytes + may be much higher than this limit. If you set this limit too high then there + is a risk that ongoing remote recoveries will consume an excess of bandwidth + (or other resources) which could destabilize the cluster. Defaults to `40mb`. `ccr.indices.recovery.max_concurrent_file_chunk`:: Controls the number of file chunk requests that can be sent in parallel per recovery. - As multiple recoveries are already running in parallel (controlled by - cluster.routing.allocation.node_concurrent_recoveries), increasing this expert-level - setting might only help in situations where peer recovery of a single shard is not - reaching the total inbound and outbound peer recovery traffic as configured by - indices.recovery.max_bytes_per_sec, but is CPU-bound instead, typically when using - transport-level security or compression. Defaults to `2`. + As multiple remote recoveries might already running in parallel, increasing this + expert-level setting might only help in situations where remote recovery of a single shard + is not reaching the total inbound and outbound remote recovery traffic as configured by + `ccr.indices.recovery.max_bytes_per_sec`. Defaults to `5`. The maximum allowed value is + `10`. + +`ccr.indices.recovery.chunk_size`:: + Controls the chunk size requested by the follower during file transfer. Defaults to + `1mb`. + +`ccr.indices.recovery.recovery_activity_timeout`:: + Controls the timeout for recovery activity. This timeout primarily applies on the leader + cluster. The leader cluster must open resources in-memory to supply data to the follower + during the recovery process. If the leader does not receive recovery requests from the + follower for this period of time, it will close the resources. Defaults to `60 seconds`. + +`ccr.indices.recovery.internal_action_timeout`:: + Controls the timeout for individual network requests during the remote recovery + process. An individual action timing out can fail the recovery. Defaults to `60 seconds`. + These settings can be dynamically updated on a live cluster with the <> API. From 67107971b92ca4fa57dcc48aa005af0982ca5e22 Mon Sep 17 00:00:00 2001 From: Tim Brooks Date: Thu, 28 Feb 2019 12:48:40 -0700 Subject: [PATCH 3/7] Changes --- docs/reference/ccr/remote-recovery.asciidoc | 24 ++++++++++++++------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/docs/reference/ccr/remote-recovery.asciidoc b/docs/reference/ccr/remote-recovery.asciidoc index 6a99270a747a1..508fe0407c632 100644 --- a/docs/reference/ccr/remote-recovery.asciidoc +++ b/docs/reference/ccr/remote-recovery.asciidoc @@ -4,7 +4,9 @@ Remote recovery is the process used to build a new copy of a shard on a follower node by copying data from the primary shard in the leader cluster. {es} uses this remote recovery process to bootstrap a follower index using the data from the -leader index. +leader index. This allows the follower to receive a copy of the current state of +the leader index, even if a complete history of changes is not available on the +leader due to Lucene segment merging. Remote recovery is a network intensive process that transfers all of the Lucene segment files from the leader cluster to the follower cluster. The follower @@ -15,12 +17,12 @@ recovery is designed to support leader and follower clusters with high network latency between them. Information about an in-progress remote recovery can be obtained using the -<> api. Remote recoveries are implemented using the -<> infrastructure. This means that on-going -remote recoveries will be labelled as type `snapshot` in the recovery api. +<> api on the follower cluster. Remote recoveries are implemented +using the <> infrastructure. This means that +on-going remote recoveries will be labelled as type `snapshot` in the recovery api. -The following _expert_ setting can be set to manage the resources consumed by -remote recoveries: +The following setting can be used to rate-limit the data transmitted during remote +recoveries: `ccr.indices.recovery.max_bytes_per_sec`:: Limits the total inbound and outbound remote recovery traffic on each node. @@ -28,9 +30,15 @@ remote recoveries: performing remote recoveries concurrently, the total amount of remote recovery bytes may be much higher than this limit. If you set this limit too high then there is a risk that ongoing remote recoveries will consume an excess of bandwidth - (or other resources) which could destabilize the cluster. Defaults to `40mb`. + (or other resources) which could destabilize the cluster. This setting is used by both + the leader and follower clusters. For example if it is set to `20mb` on a leader, the + leader will only send `20mb/s` to the follower even if the follower is requesting and can + accept `60mb/s`. Defaults to `40mb`. + +The following _expert_ settings can be set to manage the resources consumed by +remote recoveries: -`ccr.indices.recovery.max_concurrent_file_chunk`:: +`ccr.indices.recovery.max_concurrent_file_chunks`:: Controls the number of file chunk requests that can be sent in parallel per recovery. As multiple remote recoveries might already running in parallel, increasing this expert-level setting might only help in situations where remote recovery of a single shard From b3d85563fceb408350ecc1a9b8593131099b4e7c Mon Sep 17 00:00:00 2001 From: lcawl Date: Fri, 1 Mar 2019 09:14:52 -0800 Subject: [PATCH 4/7] [DOCS] Adds remote recovery to Stack Overview --- docs/reference/ccr/index.asciidoc | 1 + docs/reference/ccr/remote-recovery.asciidoc | 10 ++++++---- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/docs/reference/ccr/index.asciidoc b/docs/reference/ccr/index.asciidoc index be281d05c05f3..4bb10e0fabfaa 100644 --- a/docs/reference/ccr/index.asciidoc +++ b/docs/reference/ccr/index.asciidoc @@ -29,3 +29,4 @@ include::overview.asciidoc[] include::requirements.asciidoc[] include::auto-follow.asciidoc[] include::getting-started.asciidoc[] +include::remote-recovery.asciidoc[] diff --git a/docs/reference/ccr/remote-recovery.asciidoc b/docs/reference/ccr/remote-recovery.asciidoc index 508fe0407c632..16a8d7e8e9ca9 100644 --- a/docs/reference/ccr/remote-recovery.asciidoc +++ b/docs/reference/ccr/remote-recovery.asciidoc @@ -1,5 +1,7 @@ +[role="xpack"] +[testenv="platinum"] [[remote-recovery]] -=== Remote Recovery +== Remote recovery Remote recovery is the process used to build a new copy of a shard on a follower node by copying data from the primary shard in the leader cluster. {es} uses this @@ -17,8 +19,8 @@ recovery is designed to support leader and follower clusters with high network latency between them. Information about an in-progress remote recovery can be obtained using the -<> api on the follower cluster. Remote recoveries are implemented -using the <> infrastructure. This means that +{ref}/cat-recovery.html[recovery API] on the follower cluster. Remote recoveries are implemented +using the {ref}/modules-snapshots.html[snapshot and restore] infrastructure. This means that on-going remote recoveries will be labelled as type `snapshot` in the recovery api. The following setting can be used to rate-limit the data transmitted during remote @@ -62,4 +64,4 @@ remote recoveries: These settings can be dynamically updated on a live cluster with the -<> API. +{ref}/cluster-update-settings.html[cluster update settings API]. From 5f656cad7eabc66264d88d504af86df45ee47605 Mon Sep 17 00:00:00 2001 From: Tim Brooks Date: Mon, 4 Mar 2019 14:36:52 -0700 Subject: [PATCH 5/7] Chnages --- docs/reference/ccr/getting-started.asciidoc | 4 ++-- docs/reference/ccr/remote-recovery.asciidoc | 23 +++++++++++---------- 2 files changed, 14 insertions(+), 13 deletions(-) diff --git a/docs/reference/ccr/getting-started.asciidoc b/docs/reference/ccr/getting-started.asciidoc index 26cc81db7fe93..1529e325485ee 100644 --- a/docs/reference/ccr/getting-started.asciidoc +++ b/docs/reference/ccr/getting-started.asciidoc @@ -253,10 +253,10 @@ PUT /server-metrics-copy/_ccr/follow?wait_for_active_shards=1 ////////////////////////// -The follower index is bootstrapped using the <> +The follower index is initialized using the <> process. The remote recovery process transfers the existing Lucene segment files from the leader to the follower. When the remote recovery process is complete, -the index following will be initiated. +the index following begins. Now when you index documents into your leader index, you will see these documents replicated in the follower index. You can diff --git a/docs/reference/ccr/remote-recovery.asciidoc b/docs/reference/ccr/remote-recovery.asciidoc index 508fe0407c632..25fbe8dd4064f 100644 --- a/docs/reference/ccr/remote-recovery.asciidoc +++ b/docs/reference/ccr/remote-recovery.asciidoc @@ -1,25 +1,26 @@ [[remote-recovery]] === Remote Recovery -Remote recovery is the process used to build a new copy of a shard on a follower -node by copying data from the primary shard in the leader cluster. {es} uses this -remote recovery process to bootstrap a follower index using the data from the -leader index. This allows the follower to receive a copy of the current state of -the leader index, even if a complete history of changes is not available on the -leader due to Lucene segment merging. +When you create a follower index, you cannot use it until it is fully initialized. +The _remote recovery_ process builds a new copy of a shard on a follower node by +copying data from the primary shard in the leader cluster. {es} uses this remote +recovery process to bootstrap a follower index using the data from the leader index. +This process provides the follower with a copy of the current state of the leader index, +even if a complete history of changes is not available on the leader due to Lucene +segment merging. Remote recovery is a network intensive process that transfers all of the Lucene segment files from the leader cluster to the follower cluster. The follower requests that a recovery session be initiated on the primary shard in the leader cluster. The follower then requests file chunks concurrently from the leader. By -default, the the process concurrently requests `5` large `1mb` file chunks as remote -recovery is designed to support leader and follower clusters with high network -latency between them. +default, the process concurrently requests `5` large `1mb` file chunks. This default +behavior is designed to support leader and follower clusters with high network latency +between them. -Information about an in-progress remote recovery can be obtained using the +You can obtain information about an in-progress remote recovery by using the <> api on the follower cluster. Remote recoveries are implemented using the <> infrastructure. This means that -on-going remote recoveries will be labelled as type `snapshot` in the recovery api. +on-going remote recoveries are labelled as type `snapshot` in the recovery API. The following setting can be used to rate-limit the data transmitted during remote recoveries: From 72897af2cceac974377d3cc113a17d0604562036 Mon Sep 17 00:00:00 2001 From: lcawl Date: Mon, 4 Mar 2019 14:34:32 -0800 Subject: [PATCH 6/7] [DOCS] Move CCR settings to Elasticsearch Ref --- docs/reference/ccr/remote-recovery.asciidoc | 55 +++---------------- docs/reference/settings/ccr-settings.asciidoc | 52 ++++++++++++++++++ .../settings/configuring-xes.asciidoc | 3 +- 3 files changed, 62 insertions(+), 48 deletions(-) create mode 100644 docs/reference/settings/ccr-settings.asciidoc diff --git a/docs/reference/ccr/remote-recovery.asciidoc b/docs/reference/ccr/remote-recovery.asciidoc index 16a8d7e8e9ca9..6b01cb0a3f0de 100644 --- a/docs/reference/ccr/remote-recovery.asciidoc +++ b/docs/reference/ccr/remote-recovery.asciidoc @@ -14,54 +14,15 @@ Remote recovery is a network intensive process that transfers all of the Lucene segment files from the leader cluster to the follower cluster. The follower requests that a recovery session be initiated on the primary shard in the leader cluster. The follower then requests file chunks concurrently from the leader. By -default, the the process concurrently requests `5` large `1mb` file chunks as remote +default, the process concurrently requests `5` large `1mb` file chunks as remote recovery is designed to support leader and follower clusters with high network latency between them. -Information about an in-progress remote recovery can be obtained using the -{ref}/cat-recovery.html[recovery API] on the follower cluster. Remote recoveries are implemented -using the {ref}/modules-snapshots.html[snapshot and restore] infrastructure. This means that -on-going remote recoveries will be labelled as type `snapshot` in the recovery api. - -The following setting can be used to rate-limit the data transmitted during remote -recoveries: - -`ccr.indices.recovery.max_bytes_per_sec`:: - Limits the total inbound and outbound remote recovery traffic on each node. - Since this limit applies on each node, but there may be many nodes - performing remote recoveries concurrently, the total amount of remote recovery bytes - may be much higher than this limit. If you set this limit too high then there - is a risk that ongoing remote recoveries will consume an excess of bandwidth - (or other resources) which could destabilize the cluster. This setting is used by both - the leader and follower clusters. For example if it is set to `20mb` on a leader, the - leader will only send `20mb/s` to the follower even if the follower is requesting and can - accept `60mb/s`. Defaults to `40mb`. - -The following _expert_ settings can be set to manage the resources consumed by -remote recoveries: - -`ccr.indices.recovery.max_concurrent_file_chunks`:: - Controls the number of file chunk requests that can be sent in parallel per recovery. - As multiple remote recoveries might already running in parallel, increasing this - expert-level setting might only help in situations where remote recovery of a single shard - is not reaching the total inbound and outbound remote recovery traffic as configured by - `ccr.indices.recovery.max_bytes_per_sec`. Defaults to `5`. The maximum allowed value is - `10`. +There are dynamic settings that you can use to rate-limit the transmitted data +and manage the resources consumed by remote recoveries. See +{ref}/ccr-settings.html[{ccr-cap} settings]. -`ccr.indices.recovery.chunk_size`:: - Controls the chunk size requested by the follower during file transfer. Defaults to - `1mb`. - -`ccr.indices.recovery.recovery_activity_timeout`:: - Controls the timeout for recovery activity. This timeout primarily applies on the leader - cluster. The leader cluster must open resources in-memory to supply data to the follower - during the recovery process. If the leader does not receive recovery requests from the - follower for this period of time, it will close the resources. Defaults to `60 seconds`. - -`ccr.indices.recovery.internal_action_timeout`:: - Controls the timeout for individual network requests during the remote recovery - process. An individual action timing out can fail the recovery. Defaults to `60 seconds`. - - -These settings can be dynamically updated on a live cluster with the -{ref}/cluster-update-settings.html[cluster update settings API]. +Information about an in-progress remote recovery can be obtained using the +{ref}/cat-recovery.html[recovery API] on the follower cluster. Remote recoveries +are implemented using the {ref}/modules-snapshots.html[snapshot and restore] infrastructure. This means that on-going remote recoveries will be labelled as +type `snapshot` in the recovery api. diff --git a/docs/reference/settings/ccr-settings.asciidoc b/docs/reference/settings/ccr-settings.asciidoc new file mode 100644 index 0000000000000..286bb421662ff --- /dev/null +++ b/docs/reference/settings/ccr-settings.asciidoc @@ -0,0 +1,52 @@ +[role="xpack"] +[[ccr-settings]] +=== {ccr-cap} settings + +These {ccr} settings can be dynamically updated on a live cluster with the +<>. + +[float] +[[ccr-recovery-settings]] +==== Remote recovery settings + +The following setting can be used to rate-limit the data transmitted during +{stack-ov}/remote-recovery.html[remote recoveries]: + +`ccr.indices.recovery.max_bytes_per_sec` (<>):: +Limits the total inbound and outbound remote recovery traffic on each node. +Since this limit applies on each node, but there may be many nodes performing +remote recoveries concurrently, the total amount of remote recovery bytes may be +much higher than this limit. If you set this limit too high then there is a risk +that ongoing remote recoveries will consume an excess of bandwidth (or other +resources) which could destabilize the cluster. This setting is used by both the +leader and follower clusters. For example if it is set to `20mb` on a leader, +the leader will only send `20mb/s` to the follower even if the follower is +requesting and can accept `60mb/s`. Defaults to `40mb`. + +[float] +[[ccr-advanced-recovery-settings]] +==== Advanced remote recovery settings + +The following _expert_ settings can be set to manage the resources consumed by +remote recoveries: + +`ccr.indices.recovery.max_concurrent_file_chunks` (<>):: +Controls the number of file chunk requests that can be sent in parallel per +recovery. As multiple remote recoveries might already running in parallel, +increasing this expert-level setting might only help in situations where remote +recovery of a single shard is not reaching the total inbound and outbound remote recovery traffic as configured by `ccr.indices.recovery.max_bytes_per_sec`. +Defaults to `5`. The maximum allowed value is `10`. + +`ccr.indices.recovery.chunk_size`(<>):: +Controls the chunk size requested by the follower during file transfer. Defaults to +`1mb`. + +`ccr.indices.recovery.recovery_activity_timeout`(<>):: +Controls the timeout for recovery activity. This timeout primarily applies on +the leader cluster. The leader cluster must open resources in-memory to supply +data to the follower during the recovery process. If the leader does not receive recovery requests from the follower for this period of time, it will close the resources. Defaults to 60 seconds. + +`ccr.indices.recovery.internal_action_timeout` (<>):: +Controls the timeout for individual network requests during the remote recovery +process. An individual action timing out can fail the recovery. Defaults to +60 seconds. diff --git a/docs/reference/settings/configuring-xes.asciidoc b/docs/reference/settings/configuring-xes.asciidoc index 29c6b95dddf0f..48401c1a03433 100644 --- a/docs/reference/settings/configuring-xes.asciidoc +++ b/docs/reference/settings/configuring-xes.asciidoc @@ -6,7 +6,8 @@ ++++ include::{asciidoc-dir}/../../shared/settings.asciidoc[] +include::ccr-settings.asciidoc[] include::license-settings.asciidoc[] include::ml-settings.asciidoc[] -include::notification-settings.asciidoc[] include::sql-settings.asciidoc[] +include::notification-settings.asciidoc[] From ef289babed833454e8923dee9bb0d0eaae3fd650 Mon Sep 17 00:00:00 2001 From: lcawl Date: Mon, 4 Mar 2019 19:03:56 -0800 Subject: [PATCH 7/7] [DOCS] Fixes broken links --- docs/reference/ccr/remote-recovery.asciidoc | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/docs/reference/ccr/remote-recovery.asciidoc b/docs/reference/ccr/remote-recovery.asciidoc index c3d4afc8d7f25..fcf03cfc72814 100644 --- a/docs/reference/ccr/remote-recovery.asciidoc +++ b/docs/reference/ccr/remote-recovery.asciidoc @@ -1,5 +1,7 @@ +[role="xpack"] +[testenv="platinum"] [[remote-recovery]] -=== Remote Recovery +== Remote recovery When you create a follower index, you cannot use it until it is fully initialized. The _remote recovery_ process builds a new copy of a shard on a follower node by @@ -22,7 +24,6 @@ and manage the resources consumed by remote recoveries. See {ref}/ccr-settings.html[{ccr-cap} settings]. You can obtain information about an in-progress remote recovery by using the -<> api on the follower cluster. Remote recoveries are implemented -using the <> infrastructure. This means that -on-going remote recoveries are labelled as type `snapshot` in the recovery API. - +{ref}/cat-recovery.html[recovery API] on the follower cluster. Remote recoveries +are implemented using the {ref}/modules-snapshots.html[snapshot and restore] infrastructure. This means that on-going remote recoveries are labelled as type +`snapshot` in the recovery API.