-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Curious Trino retry behaviour #22989
Comments
Is this related? #16397 |
How large is the result set your query is pulling? FINISHING suggests that query is waiting for data transfer to client to complete. Also what client are you using? |
Thanks for your reply! > How large is the result set your query is pulling? > Also what client are you using? |
There used to be a bug in the python client in the past regarding this. See trinodb/trino-python-client#220. If you are using a version with that fix this behaviour is unexpected indeed. Also if you can reliably reproduce this it'll be useful to collect some HTTP request logging from the Python client. We have trinodb/trino-python-client#463 which we've been unable to reproduce and track down. You can do that by adding the below before importing the python client:
And AFTER importing python client add
|
Here's an example. First the times (I recorded them by hand, so there's +/- one or two seconds of error): 09:49:31 Query appears. Status PLANNING *Changing to BLOCKED a second time doesn't usually happen. Here are the logs. Two things I notice: (1) The frequency of the entries increases at 09:50:37.
The entries are then stable for a few minutes. Here's an extract:
These are the final log entries:
|
Here are some extracts from the logs of the Trino coordinator pod (notice that these times are in UTC):
|
Also related? #18467 |
(This is a repost of a question on SO.)
Short version
After retrying, queries hang on status FINISHING for five minutes.
Long version
The following is an extract from the
values.yaml
that we are using for version 0.17.0 of the Trino chart (see fault-tolerant execution):As the value of
tag
indicates, we are using release 423 of Trino.I start some queries and then manually delete some pods. After the amount of time set by
query.remote-task.max-error-duration
(in this case one second, but I have tried different values), the statuses of queries change to BLOCKED, a few seconds pass, the queries resume (status is RUNNING), some more time passes and then the statuses reach FINISHING. So far so good. But this is where it gets a little strange: The statuses stay on FINISHING until five minutes (300 seconds) after the statuses changed to BLOCKED. I've tried it several times with lots of different queries and it consistently follows this behaviour, so it must be a config setting, but I don't know which one. I have tried changing the value ofquery.client.timeout
(see docs), since this is the only one I could find with a default value of 5 minutes, but it made no difference.The time spent on status FINISHING just seems like wasted time to me and so I would like to get to the bottom of this issue.
The text was updated successfully, but these errors were encountered: