-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Cloud Storage client retry on backend error #3586
Comments
I've contacted the storage backend team and if they aren't against it I'll add the retry logic. |
Based on the discussion with storage backend the 410 happens during a JSON API resumable upload session. The error likely indicates that the upload session has already been terminated and retrying the individual HTTP request would not work (the entire upload session has to be restarted). An internal bug has been filed and storage team is actively working on it. |
@hzyi-google and @JustinBeckwith |
For googlers: b/116709007 is the internal bug. Dataflow already effectively retries 410s by retrying every failed shard 4 times regardless of why it failed, so it won't be a problem for Dataflow users. b/115694839 tracks the implementation of resumable uploads. People who are having these 410s directly in their projects might need to wait for this feature. |
This issue is important and unfortunately not solvable by clients. I'm going to close this issue, since there's nothing we can do here. |
@hzyi-google Could you please update the status of the internal bug? It's been almost a year... |
We're operating at scale on GCS and are regularly experiencing transient HTTP 410 status codes when accessing Cloud storage. Those 410 status codes returned by Cloud storage are bogus though, as they are effectively just hiding an internal backend error on GCS, which is reflected in the error details:
The google-cloud-storage client does not treat the 410 status code as retryable, understandibly so. It should be retrying on backend errors, though, which are typically exposed with status code 500 or 503. I'm suggesting to treat backend errors in the client in the same way as it treats internal errors, namely match on
reason == backendError
independently of HTTP status code.Note that we're not the first ones to experience this, and the client should be resilient against these transient GCS errors.
The text was updated successfully, but these errors were encountered: