Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redshift create table with backup option #18

Closed
krishbox opened this issue Apr 28, 2020 · 12 comments · Fixed by #42
Closed

Redshift create table with backup option #18

krishbox opened this issue Apr 28, 2020 · 12 comments · Fixed by #42
Labels
good_first_issue Good for newcomers type:enhancement New feature or request

Comments

@krishbox
Copy link

Describe the feature

The CREATE TABLE statement in Redshift has a configuration option that controls whether or not the table should be included in automated and manual cluster snapshots. The default setting for this option is YES. At present this is not configurable in DBT and the default is used for all table and incremental materialisations.

By allowing configuration of this parameter in DBT models we can save processing time when creating snapshots and restoring from snapshots and to reduce storage space on Amazon Simple Storage Service.

Describe alternatives you've considered

There is no way to get around this except to turn off automated snapshots at a cluster level.

To disable automated snapshots, set the retention period to zero. If you disable automated snapshots, Amazon Redshift stops taking snapshots and deletes any existing automated snapshots for the cluster.

Additional context

This feature is Redshift specific. The feature is similar but not exactly the same as Snowflakes TRANSIENT table.

Who will this benefit?

This will benefit all DBT users who use Redshift and who want to control storage costs and to speed up snapshot creation and restores.

P.S. If you think this is worthwhile and in keeping with DBT's principles, then I would love to work on this issue with a PR.

@drewbanin
Copy link
Contributor

hey @krishbox - I'd be happy for dbt to support a setting like this! If you're interested in sending through a PR, that would definitely be welcomed :)

Before you do that though, can we sketch out what the config interface would look like? Curious to hear what you have in mind

@krishbox
Copy link
Author

krishbox commented May 4, 2020

Hi @drewbanin, I hope I have understood your question correctly!

I was thinking of having something like this in dbt_project.yml. the config would look like

models:
  enabled: true
  materialized: table
  backup: true

I'm not entirely happy with the name of the option, suggestions welcome, other candidates:

  • automated_backup
  • snapshot_backup (although this would cause confusion with DBT snapshots)

@drewbanin
Copy link
Contributor

Cool - backup: true sounds good to me!

Feel free to send through a PR for this when you can! Let us know if there's anything we can help out with :)

@dlb8685
Copy link
Contributor

dlb8685 commented Mar 26, 2021

If no one has jumped on this yet, I would be interested in working on this as a first-time contributor. Let me know if this sounds good.

I will also have some more specific questions next week, but for now I just wanted to say I'm interested in working on this.

@krishbox
Copy link
Author

Sounds good @dlb8685, thank you!

@dlb8685
Copy link
Contributor

dlb8685 commented Mar 30, 2021

I think I'm off to a good start on getting tests to run and setting up the repo locally. I'm getting a little stuck on finding the relevant file(s) that convert models yaml to compiled SQL. Are there a couple of pointers on which part of the code base to focus in on?

@jtcohen6
Copy link
Contributor

@dlb8685 It sounds like the change involves a new config option, templated into the create table as statement (docs).

I believe the relevant bits of code for that change would be the redshift__create_table_as macro:

https://github.com/fishtown-analytics/dbt/blob/ce30dfa82d71f8a98e18c9cad61cebe0fc602d48/plugins/redshift/dbt/include/redshift/macros/adapters.sql#L32-L52

And the RedshiftConfig object, which defines the available node configs specific to Redshift features:

https://github.com/fishtown-analytics/dbt/blob/ce30dfa82d71f8a98e18c9cad61cebe0fc602d48/plugins/redshift/dbt/adapters/redshift/impl.py#L12-L16

@dlb8685
Copy link
Contributor

dlb8685 commented Mar 31, 2021

@jtcohen6 Thanks for the help, it definitely made things easier for me. I do think I have a solution now that is working on a little test project I have.

One last thing is, I'm trying to find where in the test files I could add an integration test for this. It looks like this may be the right general area? But I can't figure out where, if at all, there is a project configuration where I can add the backup: false option to something and then write a test to confirm it is working.

https://github.com/fishtown-analytics/dbt/blob/17e57f1e0b3669c055d2a757ab21695c0e86e4db/test/integration/054_adapter_methods_test/test_adapter_methods.py#L27-L32

@jtcohen6
Copy link
Contributor

@dlb8685 Glad to hear it!

The 054_adapter_methods_test is after something a bit more complex. I think you could create a new standalone test, perhaps within 034_redshift_test, that:

  • Includes a new table model
  • Runs dbt and confirms that backup does not appear in the materialization DDL statement
  • Sets backup: false for that table model, perhaps once via config() and once via dbt_project.yml
  • Runs dbt each time, and checks that backup no appears in the materialization DDL statement

As a final check, is there a system table that we can query, to confirm that the table has been opted out of backup?

@jtcohen6 jtcohen6 transferred this issue from dbt-labs/dbt-core Oct 12, 2021
@jtcohen6 jtcohen6 added type:enhancement New feature or request good_first_issue Good for newcomers labels Oct 12, 2021
@jtcohen6
Copy link
Contributor

jtcohen6 commented Oct 12, 2021

@dlb8685 I think the code changes in this repo will be just about identical to the ones you've made in dbt-labs/dbt-core#3221

Pointers:

@dlb8685
Copy link
Contributor

dlb8685 commented Nov 17, 2021

@jtcohen6 I have some changes now in my local fork. Can you remind me which branch in dbt-redshift I should create a PR into? I'm assuming not main, but the other options look pretty random.

image

@jtcohen6
Copy link
Contributor

@dlb8685 main is right! We may not be able to get this in for v1, but that's where we'll want to merge it regardless, for inclusion in a future minor version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good_first_issue Good for newcomers type:enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants