Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2455] [Feature] Less strict secret env vars: allow elsewhere besides just profiles.yml + packages.yml #7425

Closed
3 tasks done
jeremyyeo opened this issue Apr 20, 2023 · 5 comments
Labels
enhancement New feature or request stale Issues that have gone stale wontfix Not a bug or out of scope for dbt-core

Comments

@jeremyyeo
Copy link
Contributor

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

I think this one just calls for a reversal of the decision made in #4310

One example where you might want to use secret env vars is when using Snowflake's encrypt_x / decrypt_x functions... e.g.

-- models/pii.sql
select encrypt('john', '{{ env_var("DBT_ENV_SECRET_PWD") }}') as name

Which due to the issue/pr above results in the expected error:

Parsing Error
  Secret env vars are allowed only in profiles.yml or packages.yml. Found 'DBT_ENV_SECRET_PWD' referenced elsewhere.

Describe alternatives you've considered

I don't think any good alternatives exist.

  1. You use a normal var - which exposes the passphrase in plaintext.
  2. On snowflake - you could store the passphrase in a table - setup datamasking/role based access and what not - kinda convoluted.

Who will this benefit?

Anyone wanting to use secret vars in places that are not just those 2 files.

Are you interested in contributing this feature?

Sure

Anything else?

No response

@jeremyyeo jeremyyeo added enhancement New feature or request triage labels Apr 20, 2023
@github-actions github-actions bot changed the title [Feature] Less strict secret env vars: allow elsewhere besides just profiles.yml + packages.yml [CT-2455] [Feature] Less strict secret env vars: allow elsewhere besides just profiles.yml + packages.yml Apr 20, 2023
@jtcohen6
Copy link
Contributor

jtcohen6 commented Apr 21, 2023

@jeremyyeo There's a workaround for this today:

  • Create two env vars, one with the "secret" prefix and one without (e.g. DBT_ENV_SECRET_PWD and SECRET_PWD) set to the same value
  • Use the "non-secret" env var inside the SQL file
-- models/pii.sql
select encrypt('john', '{{ env_var("SECRET_PWD") }}') as name

The value will be scrubbed from dbt's logs:

$ SECRET_PWD=abc123 DBT_ENV_SECRET_PWD=abc123 dbt compile -s pii
12:20:52  Running with dbt=1.5.0-rc2
...
12:20:55  Compiled node 'pii' is:
-- models/pii.sql
select encrypt('john', '*****') as name
create or replace   view analytics.dbt_jcohen.pii
  
   as (
    -- models/pii.sql
select encrypt('john', '*****') as name
  );

And for a completely separate reason, they will also be scrubbed from Snowflake's query logs, because of the nature of the function (encrypt) being used:

Screenshot 2023-04-21 at 14 07 56

Screenshot 2023-04-21 at 14 16 06

The downside of this approach is, by also storing the value in a non-secret env var, there's nothing preventing someone with development access to the same project from writing that value to a database table (outside a function like encrypt that Snowflake knows to scrub), and then querying the table to read it back out. That's not possible with a "secret" env var due to the intentional limitations around where it can be used/rendered.

Longer term, we should really reconsider how we treat secrets (#6353). Currently, it's just a special behavior for scrubbing a value from the dbt (logs) if it matches an environment variable with a certain prefix.

I'm going to close this specific ask as a wontfix in the meantime.

@jtcohen6 jtcohen6 added wontfix Not a bug or out of scope for dbt-core and removed triage labels Apr 21, 2023
@jtcohen6 jtcohen6 closed this as not planned Won't fix, can't repro, duplicate, stale Apr 21, 2023
@jeremyyeo
Copy link
Contributor Author

I guess this may be one that could be more dbt Cloud related? But is directly the result of dbt-core not supporting secrets in other places (one of those "cross-functional" thingies =]). Why would one use secret env vars in the first place?
a) You don't want it logged in logs.
b) You don't want all dbt developers to know the secret value in the first place.

I think what you mentioned addresses (a) but not (b)?

@jeremyyeo jeremyyeo reopened this Apr 21, 2023
@jeremyyeo
Copy link
Contributor Author

I wrote this elsewhere but worth putting down here too - recapping why simply putting env vars in model files doesn't solve the problem at hand.

Let's assume we want to use the encrypt function (for simplicity).

And assuming a toy example:

-- some_model.sql
select encrypt('john', 'correct horse battery staple') as name

Okay, so when we build this table in Snowflake - what DDL/DML does dbt send? First things first... dbt has to compile the above model into working and proper SQL:

-- target/compiled/some_model.sql
create table some_model as 
select encrypt('john', 'correct horse battery staple') as name;

Now... let's assume for a second that dbt can have secret env vars in model files:

-- some_model.sql
select encrypt('john', '{{ env_var("DBT_ENV_SECRET_PASSWORD") }}') as name

Now... what happens?

Firstly, dbt will always have to compile the sql model file and output it to the target folder - just like above. So - if dbt were to MASK the values in the compiled SQL:

-- target/compiled/some_model.sql
create table some_model as 
select encrypt('john', '!!!!*****!!!!') as name;

^ Then you table some_model will literally be using the string '!!!!*****!!!!' as the encryption passphrase - as opposed to using the env var set by DBT_ENV_SECRET_PASSWORD (which would have been equal to the string 'correct horse battery staple').

Because dbt always always has to compile to valid SQL first - that's kind of the rationale that secret env vars were explicitly blocked from model files.


Separately I went about testing out using Snowflake Data Masking for this.

image

I created a secret table - whose value is only visible to TRANSFORMER and not DEVELOPER - I then tried to use the DEVELOPER role to try and encrypt it's value.

image (1)

It failed with the passphrase 'correct horse battery staple'.

image (2)

But succeeded with the passphrase that was 5 asterisk - the masked value. Meaning it had been encrypted not with 'correct horse battery staple' but '*****' - so unfortunately this looks like a dead end as well unfortunately :(

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Oct 25, 2023
Copy link
Contributor

github-actions bot commented Nov 1, 2023

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale Issues that have gone stale wontfix Not a bug or out of scope for dbt-core
Projects
None yet
Development

No branches or pull requests

2 participants