Redshift create table with backup option #18

krishbox · 2020-04-28T15:36:21Z

Describe the feature

The CREATE TABLE statement in Redshift has a configuration option that controls whether or not the table should be included in automated and manual cluster snapshots. The default setting for this option is YES. At present this is not configurable in DBT and the default is used for all table and incremental materialisations.

By allowing configuration of this parameter in DBT models we can save processing time when creating snapshots and restoring from snapshots and to reduce storage space on Amazon Simple Storage Service.

Describe alternatives you've considered

There is no way to get around this except to turn off automated snapshots at a cluster level.

To disable automated snapshots, set the retention period to zero. If you disable automated snapshots, Amazon Redshift stops taking snapshots and deletes any existing automated snapshots for the cluster.

Additional context

This feature is Redshift specific. The feature is similar but not exactly the same as Snowflakes TRANSIENT table.

Who will this benefit?

This will benefit all DBT users who use Redshift and who want to control storage costs and to speed up snapshot creation and restores.

P.S. If you think this is worthwhile and in keeping with DBT's principles, then I would love to work on this issue with a PR.

The text was updated successfully, but these errors were encountered:

drewbanin · 2020-04-29T12:02:30Z

hey @krishbox - I'd be happy for dbt to support a setting like this! If you're interested in sending through a PR, that would definitely be welcomed :)

Before you do that though, can we sketch out what the config interface would look like? Curious to hear what you have in mind

krishbox · 2020-05-04T03:07:00Z

Hi @drewbanin, I hope I have understood your question correctly!

I was thinking of having something like this in dbt_project.yml. the config would look like

models:
  enabled: true
  materialized: table
  backup: true

I'm not entirely happy with the name of the option, suggestions welcome, other candidates:

automated_backup
snapshot_backup (although this would cause confusion with DBT snapshots)

drewbanin · 2020-05-04T12:41:03Z

Cool - backup: true sounds good to me!

Feel free to send through a PR for this when you can! Let us know if there's anything we can help out with :)

dlb8685 · 2021-03-26T18:59:54Z

If no one has jumped on this yet, I would be interested in working on this as a first-time contributor. Let me know if this sounds good.

I will also have some more specific questions next week, but for now I just wanted to say I'm interested in working on this.

krishbox · 2021-03-27T10:17:42Z

Sounds good @dlb8685, thank you!

dlb8685 · 2021-03-30T15:40:18Z

I think I'm off to a good start on getting tests to run and setting up the repo locally. I'm getting a little stuck on finding the relevant file(s) that convert models yaml to compiled SQL. Are there a couple of pointers on which part of the code base to focus in on?

jtcohen6 · 2021-03-30T16:48:54Z

@dlb8685 It sounds like the change involves a new config option, templated into the create table as statement (docs).

I believe the relevant bits of code for that change would be the redshift__create_table_as macro:

https://github.com/fishtown-analytics/dbt/blob/ce30dfa82d71f8a98e18c9cad61cebe0fc602d48/plugins/redshift/dbt/include/redshift/macros/adapters.sql#L32-L52

And the RedshiftConfig object, which defines the available node configs specific to Redshift features:

https://github.com/fishtown-analytics/dbt/blob/ce30dfa82d71f8a98e18c9cad61cebe0fc602d48/plugins/redshift/dbt/adapters/redshift/impl.py#L12-L16

dlb8685 · 2021-03-31T16:44:00Z

@jtcohen6 Thanks for the help, it definitely made things easier for me. I do think I have a solution now that is working on a little test project I have.

One last thing is, I'm trying to find where in the test files I could add an integration test for this. It looks like this may be the right general area? But I can't figure out where, if at all, there is a project configuration where I can add the backup: false option to something and then write a test to confirm it is working.

https://github.com/fishtown-analytics/dbt/blob/17e57f1e0b3669c055d2a757ab21695c0e86e4db/test/integration/054_adapter_methods_test/test_adapter_methods.py#L27-L32

jtcohen6 · 2021-03-31T17:51:54Z

@dlb8685 Glad to hear it!

The 054_adapter_methods_test is after something a bit more complex. I think you could create a new standalone test, perhaps within 034_redshift_test, that:

Includes a new table model
Runs dbt and confirms that backup does not appear in the materialization DDL statement
Sets backup: false for that table model, perhaps once via config() and once via dbt_project.yml
Runs dbt each time, and checks that backup no appears in the materialization DDL statement

As a final check, is there a system table that we can query, to confirm that the table has been opted out of backup?

jtcohen6 · 2021-10-12T16:34:03Z

@dlb8685 I think the code changes in this repo will be just about identical to the ones you've made in dbt-labs/dbt-core#3221

Pointers:

impl.py
macros/
tests/integration — feel free to add to redshift_test, or a totally new one

dlb8685 · 2021-11-17T03:07:48Z

@jtcohen6 I have some changes now in my local fork. Can you remind me which branch in dbt-redshift I should create a PR into? I'm assuming not main, but the other options look pretty random.

jtcohen6 · 2021-11-17T07:28:48Z

@dlb8685 main is right! We may not be able to get this in for v1, but that's where we'll want to merge it regardless, for inclusion in a future minor version.

jtcohen6 transferred this issue from dbt-labs/dbt-core Oct 12, 2021

jtcohen6 added type:enhancement New feature or request good_first_issue Good for newcomers labels Oct 12, 2021

jtcohen6 mentioned this issue Oct 12, 2021

Add Redshift parameter to create tables with backup option specified dbt-labs/dbt-core#3221

Closed

4 tasks

dlb8685 mentioned this issue Nov 18, 2021

Add Redshift parameter to create tables with backup option specified #42

Merged

4 tasks

jtcohen6 closed this as completed in #42 Nov 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redshift create table with backup option #18

Redshift create table with backup option #18

krishbox commented Apr 28, 2020

drewbanin commented Apr 29, 2020

krishbox commented May 4, 2020 •

edited

Loading

drewbanin commented May 4, 2020

dlb8685 commented Mar 26, 2021

krishbox commented Mar 27, 2021

dlb8685 commented Mar 30, 2021

jtcohen6 commented Mar 30, 2021

dlb8685 commented Mar 31, 2021

jtcohen6 commented Mar 31, 2021

jtcohen6 commented Oct 12, 2021 •

edited

Loading

dlb8685 commented Nov 17, 2021

jtcohen6 commented Nov 17, 2021

Redshift create table with backup option #18

Redshift create table with backup option #18

Comments

krishbox commented Apr 28, 2020

Describe the feature

Describe alternatives you've considered

Additional context

Who will this benefit?

drewbanin commented Apr 29, 2020

krishbox commented May 4, 2020 • edited Loading

drewbanin commented May 4, 2020

dlb8685 commented Mar 26, 2021

krishbox commented Mar 27, 2021

dlb8685 commented Mar 30, 2021

jtcohen6 commented Mar 30, 2021

dlb8685 commented Mar 31, 2021

jtcohen6 commented Mar 31, 2021

jtcohen6 commented Oct 12, 2021 • edited Loading

dlb8685 commented Nov 17, 2021

jtcohen6 commented Nov 17, 2021

krishbox commented May 4, 2020 •

edited

Loading

jtcohen6 commented Oct 12, 2021 •

edited

Loading