Allowing for steps to retry #1630

zahink · 2019-07-24T01:39:52Z

Please make sure to fill out either the issue template or the feature template and delete the other one!

Issue

BigQuery can hit some transient issues, its important to be able to configure a certain number of retries for a step in a DAG.

Issue description

When running a production run, we hit a transient bigquery error which failed that particular bigQuery query and all subsequent ones that depended on its output.

Results

What I expected was that dbt would have some configuration that would allow us to set a number of allowable retries.

System information

This was a run on dbtcloud.

Steps to reproduce

transient error on bigquery's side hard to reproduce.

Feature

Feature description

allow for configuring retry logic at an individual step or the whole DAG

Who will this benefit?

Anyone that rerlies on dbt_ for production and can't have transient errors killing the whole DAG.

advincze · 2019-10-09T07:54:55Z

We have similar issues with redshift (especially redshift spectrum) where retries would be very beneficial

advincze · 2019-10-09T07:55:51Z

seems #1579 is also related

hui-zheng · 2020-01-03T03:03:14Z

We also ran into similar challenges in our BigQuery dbt run.

For example, in production, we have situations that backfilling historical data and scheduled incremental runs are happening at the same time, and sometimes update the same table.

We then got errors like this, which could be mitigated with some re-try

  domain: "cloud.helix.ErrorDomain" code: "QUERY_ERROR" argument: "Could not serialize access 
to table projectA:dataset1.table_A1 due to concurrent update" debug_info: "
[CONCURRENT_UPDATE] Table modified by concurrent UPDATE/DELETE/MERGE DML or truncation 
at 1578010970837. Storage set job_uuid: 3ca02cc1-8d32-4c3c-afdc-76f429c1add1_00008, 
instance_id: InsertedData, Reason: code=CONCURRENT_UPDATE message=Could not serialize 
access to table projectA:dataset1.table_A1 due to concurrent update debug=Table modified by 
concurrent UPDATE/DELETE/MERGE DML or truncation at 1578010970837. Storage set job_uuid: 
3ca02cc1-8d32-4c3c-afdc-76f429c1add1_00008, instance_id: InsertedData

drewbanin · 2020-01-07T16:28:49Z

FYI #1963 adds retries to BigQuery when queries fail with a 500 status code (internal server error).

I'm going to close out this issue, as BigQuery is really the only place where 1) we see transient errors like this and 2) we receive a status code indicating that retrying can solve the problem. Happy to re-open if anyone has any further thoughts on this topic.

friendofasquid · 2021-07-21T12:06:22Z

In Spark, Presto and Athena, retries would be super helpful.

Particularly on Presto when running on EMR with SPOT instances, when instances are recalled and queries fail, a retry would be extremely helpful. Newer versions of Presto do support this internally, I believe, but not the newest version on EMR.

In some cases, we are forced to rerun the entire dbt job. A state:failed+ selector would be another interesting way to handle this.

jtcohen6 · 2021-07-21T13:33:08Z

@friendofasquid That's helpful context re: Presto. I totally agree about invocation-level retrial, too. There's a more recent issue discussing this over in #3303.

drewbanin added bigquery enhancement New feature or request labels Jul 24, 2019

drewbanin closed this as completed Jan 7, 2020

nicku33 mentioned this issue May 28, 2020

Exception handler python hook #2496

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allowing for steps to retry #1630

Allowing for steps to retry #1630

zahink commented Jul 24, 2019

advincze commented Oct 9, 2019

advincze commented Oct 9, 2019

hui-zheng commented Jan 3, 2020 •

edited

Loading

drewbanin commented Jan 7, 2020

friendofasquid commented Jul 21, 2021

jtcohen6 commented Jul 21, 2021

Allowing for steps to retry #1630

Allowing for steps to retry #1630

Comments

zahink commented Jul 24, 2019

Issue

Issue description

Results

System information

Steps to reproduce

Feature

Feature description

Who will this benefit?

advincze commented Oct 9, 2019

advincze commented Oct 9, 2019

hui-zheng commented Jan 3, 2020 • edited Loading

drewbanin commented Jan 7, 2020

friendofasquid commented Jul 21, 2021

jtcohen6 commented Jul 21, 2021

hui-zheng commented Jan 3, 2020 •

edited

Loading