-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allowing for steps to retry #1630
Comments
We have similar issues with redshift (especially redshift spectrum) where retries would be very beneficial |
seems #1579 is also related |
We also ran into similar challenges in our BigQuery dbt run. For example, in production, we have situations that backfilling historical data and scheduled incremental runs are happening at the same time, and sometimes update the same table. We then got errors like this, which could be mitigated with some re-try
|
FYI #1963 adds retries to BigQuery when queries fail with a 500 status code (internal server error). I'm going to close out this issue, as BigQuery is really the only place where 1) we see transient errors like this and 2) we receive a status code indicating that retrying can solve the problem. Happy to re-open if anyone has any further thoughts on this topic. |
In Spark, Presto and Athena, retries would be super helpful. Particularly on Presto when running on EMR with SPOT instances, when instances are recalled and queries fail, a retry would be extremely helpful. Newer versions of Presto do support this internally, I believe, but not the newest version on EMR. In some cases, we are forced to rerun the entire dbt job. A |
@friendofasquid That's helpful context re: Presto. I totally agree about invocation-level retrial, too. There's a more recent issue discussing this over in #3303. |
Please make sure to fill out either the issue template or the feature template and delete the other one!
Issue
BigQuery can hit some transient issues, its important to be able to configure a certain number of retries for a step in a DAG.
Issue description
When running a production run, we hit a transient bigquery error which failed that particular bigQuery query and all subsequent ones that depended on its output.
Results
What I expected was that dbt would have some configuration that would allow us to set a number of allowable retries.
System information
This was a run on dbtcloud.
Steps to reproduce
transient error on bigquery's side hard to reproduce.
Feature
Feature description
allow for configuring retry logic at an individual step or the whole DAG
Who will this benefit?
Anyone that rerlies on dbt_ for production and can't have transient errors killing the whole DAG.
The text was updated successfully, but these errors were encountered: