Return 429 status code when there's a read_only cluster block #50166

gaobinlong · 2019-12-13T08:51:28Z

This PR is related to #49393.
The main point of the change is:

Modify the status code from 403 to 429 when there is a cluster or index level block.
Modify the error message of ClusterBlockException.

elasticmachine · 2019-12-13T09:27:59Z

Pinging @elastic/es-distributed (:Distributed/CRUD)

jasontedor

Thanks for picking this up @gaobinlong. In the issue #49393 I did say:

Similarly, the status codes of other cluster blocks should be reexamined in this context.

However, I think this change took it too far. For example, if a write block is placed on an index manually by a user to prevent any further writes to the index, is that really a retryable situation? I would argue not.

Can you take another stab at thinking through which other blocks should return 429?

gaobinlong · 2019-12-27T03:38:45Z

@jasontedor , I have changed the code and only modify the status code to 429 when there is a read_only cluster block：

index.read_only
index.read_only_allow_delete
cluster.read_only
cluster.read_only_allow_delete

These cluster blocks have METADATA_WRITE and WRITE block level, I think they can return 429 so user can retry the write operations avoiding data loss.

jasontedor · 2019-12-27T03:44:20Z

I’m on vacation, can someone on @elastic/es-distributed review this while I’m out?

henningandersen

Thanks for the extra work on this @gaobinlong . First of all, please do not force-push to PRs, it makes it hard to review the changes made since last time and comments loose their meaning when we cannot see what the original version of the code were.

I think that of the 4 blocks modified here, we should only return TOO_MANY_REQUESTS for the one block where the disk threshold monitor clears the block automatically (i.e., the index level index.blocks.read_only_allow_delete). I worry that the other cases could have legitimate user-facing use cases (for instance manually marking an index read-only) that would break by this change.

Will you look into amending the PR (no force push, please)?

…lock

gaobinlong · 2019-12-27T08:42:29Z

@henningandersen Thanks for your comment and advise. I have pushed a new commit and only return TOO_MANY_REQUESTS for index level read_only_allow_delete block. Can you help to review the change?

henningandersen

Thanks, looks good. I left two smaller things to address. I will start a CI job for this too.

test/framework/src/main/java/org/elasticsearch/test/hamcrest/ElasticsearchAssertions.java

henningandersen · 2019-12-30T09:01:02Z

@elasticmachine test this please

henningandersen

Thanks @gaobinlong . I added a few more comments, AFAICS, the new logic to prefer 403 over 429 will not work, please have a look.

test/framework/src/main/java/org/elasticsearch/test/hamcrest/ElasticsearchAssertions.java

server/src/main/java/org/elasticsearch/cluster/block/ClusterBlockException.java

…ception

gaobinlong · 2020-01-15T03:21:27Z

Hi @henningandersen, can I take you some time to help to review the new commit I have pushed? Thanks a lot.

henningandersen

Thanks @gaobinlong for the extra iteration. I left a few smaller comments, but otherwise it is looking good. I will kick off a test run too.

server/src/main/java/org/elasticsearch/cluster/block/ClusterBlockException.java

test/framework/src/test/java/org/elasticsearch/test/hamcrest/ElasticsearchAssertionsTests.java

henningandersen · 2020-01-15T07:39:53Z

@elasticmachine test this please

henningandersen · 2020-01-15T08:37:43Z

@gaobinlong the test failures looks like they could be caused by incompatibilities between this branch and latest changes in master/7.x. Will you merge in master changes into this branch?

gaobinlong · 2020-01-17T04:36:57Z

Hi @henningandersen, I have merged master changes into the branch and optimized some code following your advice.

henningandersen · 2020-01-24T08:52:30Z

@elasticmachine test this please

henningandersen · 2020-01-24T09:11:10Z

@gaobinlong , nearly there but unfortunately the branch is now again outdated with master/7.x causing the build to fail. I can merge in the changes unless you prefer to do so yourself?

gaobinlong · 2020-02-01T06:30:50Z

@henningandersen, thanks a lot if you can help to merge in master changes.

henningandersen · 2020-02-01T12:33:10Z

@elasticmachine update branch

henningandersen

Nice catch, @gaobinlong . I added a couple of comments. Thanks again for your efforts.

test/framework/src/main/java/org/elasticsearch/test/ESIntegTestCase.java

modules/reindex/src/test/java/org/elasticsearch/index/reindex/DeleteByQueryBasicTests.java

…ryBasicTests

gaobinlong · 2020-02-08T04:53:55Z

@henningandersen, the new commit I have pushed contains some changes following your comment. The difference is that instead of updating the setting cluster.info.update.interval, the cluster info is refreshed manually to ensure the test runs faster, because the setting's minimum value is 10s. Can you have a look on the changes?

henningandersen

Sorry for the delay here. I have one comment left and then I think we are good.

modules/reindex/src/test/java/org/elasticsearch/index/reindex/DeleteByQueryBasicTests.java

gaobinlong · 2020-02-16T07:18:39Z

@henningandersen，I have pushed a new commit which delay refreshing cluster info using ThreadPool.schedule(). I can see the retry handler can be triggered when the disk allocation decider is enabled. Could you help to review the code changes?

henningandersen · 2020-02-17T07:04:14Z

@elasticmachine test this please

henningandersen · 2020-02-17T20:43:15Z

@gaobinlong there are a few checkstyle issues reported by the build, see:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request-1/16094/console

Notice that you can run:

./gradlew precommit

to check all precommit checks before pushing.

Would you mind looking into those?

gaobinlong · 2020-02-19T14:46:37Z

@henningandersen, I have done that and merged in the master changes into the branch.

henningandersen · 2020-02-19T16:01:56Z

@elasticmachine test this please

henningandersen

LGTM, thanks for the additional iterations @gaobinlong .

…#50166) We consider index level read_only_allow_delete blocks temporary since the DiskThresholdMonitor can automatically release those when an index is no longer allocated on nodes above high threshold. The rest status has therefore been changed to 429 when encountering this index block to signal retryability to clients. Related to elastic#49393

We consider index level read_only_allow_delete blocks temporary since the DiskThresholdMonitor can automatically release those when an index is no longer allocated on nodes above high threshold. The rest status has therefore been changed to 429 when encountering this index block to signal retryability to clients. Related to #49393

Users are perennially confused by the message they get when writing to an index is blocked due to excessive disk usage: TOO_MANY_REQUESTS/12/index read-only / allow delete (api) Of course this is technically accurate but it is hard to join the dots from this message to "your disk was too full" without some searching of forums and documentation. Additionally in elastic#50166 we changed the status code to today's `429` from the previous `403` which changed the message from the one that's widely documented elsewhere: FORBIDDEN/12/index read-only / allow delete (api) Since elastic#42559 we've considered this block to be under the sole control of the disk-based shard allocator, and we have seen no evidence to suggest that anyone is applying this block manually. Therefore this commit adjusts this block's message to indicate that it's caused by a lack of disk space.

Users are perennially confused by the message they get when writing to an index is blocked due to excessive disk usage: TOO_MANY_REQUESTS/12/index read-only / allow delete (api) Of course this is technically accurate but it is hard to join the dots from this message to "your disk was too full" without some searching of forums and documentation. Additionally in #50166 we changed the status code to today's `429` from the previous `403` which changed the message from the one that's widely documented elsewhere: FORBIDDEN/12/index read-only / allow delete (api) Since #42559 we've considered this block to be under the sole control of the disk-based shard allocator, and we have seen no evidence to suggest that anyone is applying this block manually. Therefore this commit adjusts this block's message to indicate that it's caused by a lack of disk space.

jimczi added :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >bug labels Dec 13, 2019

jasontedor requested changes Dec 17, 2019

View reviewed changes

Return 429 status when there is a read_only block

d114f31

gaobinlong force-pushed the gaobinlong-patch-1 branch from e93f73d to d114f31 Compare December 27, 2019 02:32

gaobinlong changed the title ~~Return 429 status code when there's a cluster or index block~~ Return 429 status code when there's a read_only cluster block Dec 27, 2019

henningandersen requested changes Dec 27, 2019

View reviewed changes

Return 429 status code when there is a index read_only_allow_delete b…

be78398

…lock

henningandersen requested changes Dec 30, 2019

View reviewed changes

test/framework/src/main/java/org/elasticsearch/test/hamcrest/ElasticsearchAssertions.java Outdated Show resolved Hide resolved

test/framework/src/main/java/org/elasticsearch/test/hamcrest/ElasticsearchAssertions.java Outdated Show resolved Hide resolved

return retryable status if there are only retryable blocks

0e7a4d7

henningandersen requested changes Jan 6, 2020

View reviewed changes

bellengao added 2 commits January 9, 2020 17:24

fix bug, status not correct

3bd3998

remove throwing IOException because testAssertBlocked never throws Ex…

63a34b7

…ception

henningandersen requested changes Jan 15, 2020

View reviewed changes

bellengao added 2 commits January 17, 2020 12:27

Merge remote-tracking branch 'origin/master' into gaobinlong-patch-1

b0c716c

optimize some code in ElasticsearchAssertionsTests

b80007f

Merge branch 'master' into gaobinlong-patch-1

5b0e020

henningandersen reviewed Feb 6, 2020

View reviewed changes

bellengao added 2 commits February 8, 2020 12:20

add testDeleteByQueryOnReadOnlyAllowDeleteIndex method in DeleteByQue…

0536c26

…ryBasicTests

Merge remote-tracking branch 'origin/master' into gaobinlong-patch-1

4a87480

henningandersen reviewed Feb 14, 2020

View reviewed changes

modules/reindex/src/test/java/org/elasticsearch/index/reindex/DeleteByQueryBasicTests.java Outdated Show resolved Hide resolved

bellengao added 2 commits February 16, 2020 13:25

Delay refreshing cluster info

75a0e97

Merge remote-tracking branch 'origin/master' into gaobinlong-patch-1

a40cdfc

bellengao added 3 commits February 18, 2020 21:28

Merge remote-tracking branch 'origin/master' into gaobinlong-patch-1

5bfcb78

format some code

cc13462

Merge remote-tracking branch 'origin/master' into gaobinlong-patch-1

5a1c77c

henningandersen approved these changes Feb 19, 2020

View reviewed changes

henningandersen merged commit 36bd666 into elastic:master Feb 22, 2020

henningandersen added v7.7.0 v8.0.0 labels Feb 22, 2020

codebrain mentioned this pull request Apr 1, 2020

7.7.0 meta ticket (Part 3) elastic/elasticsearch-net#4534

Closed

DaveCTurner mentioned this pull request Jun 23, 2020

Reword INDEX_READ_ONLY_ALLOW_DELETE_BLOCK message #58410

Merged

xeraa mentioned this pull request Mar 27, 2021

Logstash keeps retrying after receiving 403 Forbidden from Elasticsearch elastic/logstash#10023

Closed

zez3 mentioned this pull request Mar 27, 2021

Elasticsearch does not indicate retryability when flood stage is exceeded #49393

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return 429 status code when there's a read_only cluster block #50166

Return 429 status code when there's a read_only cluster block #50166

gaobinlong commented Dec 13, 2019 •

edited

Loading

elasticmachine commented Dec 13, 2019

jasontedor left a comment

gaobinlong commented Dec 27, 2019

jasontedor commented Dec 27, 2019

henningandersen left a comment

gaobinlong commented Dec 27, 2019

henningandersen left a comment

henningandersen commented Dec 30, 2019

henningandersen left a comment

gaobinlong commented Jan 15, 2020

henningandersen left a comment

henningandersen commented Jan 15, 2020

henningandersen commented Jan 15, 2020

gaobinlong commented Jan 17, 2020

henningandersen commented Jan 24, 2020

henningandersen commented Jan 24, 2020 •

edited

Loading

gaobinlong commented Feb 1, 2020

henningandersen commented Feb 1, 2020

henningandersen left a comment

gaobinlong commented Feb 8, 2020

henningandersen left a comment

gaobinlong commented Feb 16, 2020

henningandersen commented Feb 17, 2020

henningandersen commented Feb 17, 2020

gaobinlong commented Feb 19, 2020

henningandersen commented Feb 19, 2020

henningandersen left a comment

Return 429 status code when there's a read_only cluster block #50166

Return 429 status code when there's a read_only cluster block #50166

Conversation

gaobinlong commented Dec 13, 2019 • edited Loading

elasticmachine commented Dec 13, 2019

jasontedor left a comment

Choose a reason for hiding this comment

gaobinlong commented Dec 27, 2019

jasontedor commented Dec 27, 2019

henningandersen left a comment

Choose a reason for hiding this comment

gaobinlong commented Dec 27, 2019

henningandersen left a comment

Choose a reason for hiding this comment

henningandersen commented Dec 30, 2019

henningandersen left a comment

Choose a reason for hiding this comment

gaobinlong commented Jan 15, 2020

henningandersen left a comment

Choose a reason for hiding this comment

henningandersen commented Jan 15, 2020

henningandersen commented Jan 15, 2020

gaobinlong commented Jan 17, 2020

henningandersen commented Jan 24, 2020

henningandersen commented Jan 24, 2020 • edited Loading

gaobinlong commented Feb 1, 2020

henningandersen commented Feb 1, 2020

henningandersen left a comment

Choose a reason for hiding this comment

gaobinlong commented Feb 8, 2020

henningandersen left a comment

Choose a reason for hiding this comment

gaobinlong commented Feb 16, 2020

henningandersen commented Feb 17, 2020

henningandersen commented Feb 17, 2020

gaobinlong commented Feb 19, 2020

henningandersen commented Feb 19, 2020

henningandersen left a comment

Choose a reason for hiding this comment

gaobinlong commented Dec 13, 2019 •

edited

Loading

henningandersen commented Jan 24, 2020 •

edited

Loading