Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add troubleshooting guide for corrupt repository #88391

Merged
merged 15 commits into from
Jul 14, 2022
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/reference/images/register_repo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/reference/images/register_repo_details.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/reference/images/repo_details.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/reference/images/repositories.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
++++
<div class="tabs" data-tab-group="host">
<div role="tablist" aria-label="Re-add repository">
<button role="tab"
aria-selected="true"
aria-controls="cloud-tab-readd-repo"
id="cloud-readd-repo">
Elasticsearch Service
</button>
<button role="tab"
aria-selected="false"
aria-controls="self-managed-tab-readd-repo"
id="self-managed-readd-repo"
tabindex="-1">
Self-managed
</button>
</div>
<div tabindex="0"
role="tabpanel"
id="cloud-tab-readd-repo"
aria-labelledby="cloud-readd-repo">
++++

include::corrupt-repository.asciidoc[tag=cloud]

++++
</div>
<div tabindex="0"
role="tabpanel"
id="self-managed-tab-readd-repo"
aria-labelledby="self-managed-readd-repo"
hidden="">
++++

include::corrupt-repository.asciidoc[tag=self-managed]

++++
</div>
</div>
++++
Original file line number Diff line number Diff line change
@@ -0,0 +1,219 @@
// tag::cloud[]
Fixing the corrupted repository will entail making changes in multiple deployments
that write to the same snapshot repository.
Only one deployment must be writing to a repository. The deployment
that will keep writing to the repository will be called the "primary" deployment (the current cluster),
and the other one(s) where we'll mark the repository as read-only the "secondary"
deployments.

First mark the repository as read-only on the secondary deployments:

**Use {kib}**

//tag::kibana-api-ex[]
. Log in to the {ess-console}[{ecloud} console].
+

. On the **Elasticsearch Service** panel, click the name of your deployment.
+

NOTE:
If the name of your deployment is disabled your {kib} instances might be
unhealthy, in which case please contact https://support.elastic.co[Elastic Support].
If your deployment doesn't include {kib}, all you need to do is
{cloud}/ec-access-kibana.html[enable it first].

. Open your deployment's side navigation menu (placed under the Elastic logo in the upper left corner)
and go to **Stack Management > Snapshot and Restore > Repositories**.
+
[role="screenshot"]
image::images/repositories.png[{kib} Console,align="center"]

. The repositories table should now be visible. Click on the pencil icon at the
right side of the repository to be marked as read-only. On the Edit page that opened
scroll down and check "Read-only repository". Click "Save".
Alternatively if deleting the repository altogether is prefered select the checkbox
at the left of the repository name in the repositories table and click the
"Remove repository" red button at the top left of the table.

At this point, it's only the primary (current) deployment that has the repository marked
as writeable.
{es} sees it as corrupt, so the repository needs to be removed and added back so that
{es} can resume using it:

Note that we're now configuring the primary (current) deployment.

. Open the primary deployment's side navigation menu (placed under the Elastic logo in the upper left corner)
and go to **Stack Management > Snapshot and Restore > Repositories**.
+
[role="screenshot"]
image::images/repositories.png[{kib} Console,align="center"]

. Get the details for the repository we'll recreate later by clicking on the repository
name in the repositories table and making note of all the repository configurations
that are displayed on the repository details page (we'll use them when we recreate
the repository). Close the details page using the "X Close" link at
the bottom left of the page.
+
[role="screenshot"]
image::images/repo_details.png[{kib} Console,align="center"]

. With all the details above noted down, let's delete the repository. Select the
checkbox at the left of the repository and hit the "Remove repository" red button
at the top left of the page.

. Recreate the repository by clicking the "Register Repository" button
at the top right corner of the repositories table.
+
[role="screenshot"]
image::images/register_repo.png[{kib} Console,align="center"]

. Fill in the repository name, select the type and click "Next".
+
[role="screenshot"]
image::images/register_repo_details.png[{kib} Console,align="center"]

. Fill in the repository details (client, bucket, base path etc) with the values
you noted down before deleting the repository and click the "Register" button
at the bottom.

. Select "Verify repository" to confirm that your settings are correct and the
deployment can connect to your repository.
//end::kibana-api-ex[]
// end::cloud[]

// tag::self-managed[]
Fixing the corrupted repository will entail making changes in multiple clusters
that write to the same snapshot repository.
Only one cluster must be writing to a repository. Let's call the cluster
we want to keep writing to the repository the "primary" cluster (the current cluster),
and the other one(s) where we'll mark the repository as read-only the "secondary"
clusters.

Let's first work on the secondary clusters:

. Get the configuration of the repository:
+
[source,console]
----
GET _snapshot/my-repo
----
// TEST[skip:we're not setting up repos in these tests]
+
The reponse will look like this:
+
[source,console-result]
----
{
"my-repo": { <1>
"type": "s3",
"settings": {
"bucket": "repo-bucket",
"client": "elastic-internal-71bcd3",
"base_path": "myrepo"
}
}
}
----
// TESTRESPONSE[skip:the result is for illustrating purposes only]
+
<1> Represents the current configuration for the repository.

. Using the settings retrieved above, add the `readonly: true` option to mark
it as read-only:
+
[source,console]
----
PUT _snapshot/my-repo
{
"type": "s3",
"settings": {
"bucket": "repo-bucket",
"client": "elastic-internal-71bcd3",
"base_path": "myrepo",
"readonly": true <1>
}
}
----
// TEST[skip:we're not setting up repos in these tests]
+
<1> Marks the repository as read-only.

. Alternatively, deleting the repository is an option using:
+
[source,console]
----
DELETE _snapshot/my-repo
----
// TEST[skip:we're not setting up repos in these tests]
+
The response will look like this:
+
[source,console-result]
------------------------------------------------------------------------------
{
"acknowledged": true
}
------------------------------------------------------------------------------
// TESTRESPONSE[skip:the result is for illustrating purposes only]

At this point, it's only the primary (current) cluster that has the repository marked
as writeable.
{es} sees it as corrupt though so let's remove the repository and recreate it so that
{es} can resume using it:

Note that now we're configuring the primary (current) cluster.

. Get the configuration of the repository and save its configuration as we'll use it
to recreate the repository:
+
[source,console]
----
GET _snapshot/my-repo
----
// TEST[skip:we're not setting up repos in these tests]
. Delete the repository:
+
[source,console]
----
DELETE _snapshot/my-repo
----
// TEST[skip:we're not setting up repos in these tests]
+
The response will look like this:
+
[source,console-result]
------------------------------------------------------------------------------
{
"acknowledged": true
}
------------------------------------------------------------------------------
// TESTRESPONSE[skip:the result is for illustrating purposes only]

. Using the configuration we obtained above, let's recreate the repository:
+
[source,console]
----
PUT _snapshot/my-repo
{
"type": "s3",
"settings": {
"bucket": "repo-bucket",
"client": "elastic-internal-71bcd3",
"base_path": "myrepo"
}
}
----
// TEST[skip:we're not setting up repos in these tests]
+
The response will look like this:
+
[source,console-result]
------------------------------------------------------------------------------
{
"acknowledged": true
}
------------------------------------------------------------------------------
// TESTRESPONSE[skip:the result is for illustrating purposes only]
// end::self-managed[]

2 changes: 2 additions & 0 deletions docs/reference/troubleshooting.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ include::troubleshooting/data/start-ilm.asciidoc[]

include::troubleshooting/data/start-slm.asciidoc[]

include::troubleshooting/snapshot/add-repository.asciidoc[]

include::monitoring/troubleshooting.asciidoc[]

include::transform/troubleshooting.asciidoc[leveloffset=+1]
Expand Down
11 changes: 11 additions & 0 deletions docs/reference/troubleshooting/snapshot/add-repository.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[[add-repository]]
== Multiple deployments writing to the same repository

Multiple {es} deployments are writing to the same snapshot repository. {es} doesn't
support this configuration and only one cluster is allowed to write to the same
repository.
To remedy the situation mark the repository as read-only or remove it from all the
other deployments, and re-add (recreate) the repository in the current deployment:

include::{es-repo-dir}/tab-widgets/troubleshooting/snapshot/corrupt-repository-widget.asciidoc[]