Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: source_db package #280

Merged
merged 1 commit into from
Sep 28, 2023
Merged

Conversation

BfdCampos
Copy link
Contributor

Description

Tell us about your new package!

Link to your package's repository: https://github.com/BfdCampos/source_db/

Checklist

This checklist is a cut down version of the best practices that we have identified as the package hub has grown. Although meeting these checklist items is not a prerequisite to being added to the Hub, we have found that packages which don't conform provide a worse user experience.

First run experience

  • The package includes a README which explains how to get started with the package and customise its behaviour
  • The README indicates which data warehouses/platforms are expected to work with this package

Customisability

  • The package uses ref or source, instead of hard-coding table references.

Dependencies

Dependencies on dbt Core

  • The package has set a supported require-dbt-version range in dbt_project.yml. Example: A package which depends on functionality added in dbt Core 1.2 should set its require-dbt-version property to [">=1.2.0", "<2.0.0"].

Dependencies on other packages defined in packages.yml:

  • Dependencies are imported from the dbt Package Hub when available, as opposed to a git installation.
  • Dependencies contain the widest possible range of supported versions, to minimise issues in dependency resolution.
  • In particular, dependencies are not pinned to a patch version unless there is a known incompatibility.

Interoperability

  • The package does not override dbt Core behaviour in such a way as to impact other dbt resources (models, tests, etc) not provided by the package.
  • The package uses the cross-database macros built into dbt Core where available, such as {{ dbt.except() }} and {{ dbt.type_string() }}.
  • The package disambiguates its resource names to avoid clashes with nodes that are likely to already exist in a project. For example, packages should not provide a model simply called users.

Versioning

  • (Required): The package's git tags validates against the regex defined in version.py
  • The package's version follows the guidance of Semantic Versioning 2.0.0. (Note in particular the recommendation for production-ready packages to be version 1.0.0 or above)

Signed-off-by: Bruno Campos <brunofdcampos@hotmail.com>
@joellabes
Copy link
Contributor

Hey @BfdCampos, this one confuses me a bit! I don't really get why you would need to do this - it looks like your goals would be met by using deferral, in particular --favor-state and perhaps explicitly setting a --defer-state

Because of this, I'm wary of adding it to the package hub as it might get other users into a confusing situation where they inadvertently use this instead of taking advantage of the native functionality

Can you tell me more?

@BfdCampos
Copy link
Contributor Author

Hi @joellabes, thanks so much for reviewing and reaching out!

So this package is designed to dynamically set the database source based on an environment variable. They are useful for conditional database routing but work differently from the defer feature from what I understand from the documentation.

The source_db package allows the user to easily switch between databases on the fly based on environmental variables which can be declared with your dbt command or at a global environment level for a multi-tenant db setup. We find a lot of use for this in my own company for when we need to test a run locally based off of data in other databases (dev or prod). This is particularly useful for us as we cannot run dbt in production at all (meaning no access to the production artefacts). Only via automated systems.

From what I understood, the defer feature is for particularly useful for optimising computational resources in CI. It switches between databases or schemas based on the existence of a model in the current environment, automatically referring to a production model if a development one does not exist, but requires that a manifest from a previous dbt invocation be passed to the --state flag or env var which for us, is not possible. So this feature does not work for our use case unfortunately.

In summary, the source_db package allows for conditional logic based on environment variables, enabling more complex routing that defer does not offer at this time.

I would also be happy if this feature was taken up by dbt as a default command if you think it adds value 😊

(Anecdotal but the main reason why I even made this PR was because I've already sent the exact copy of the code for the package to 4 friends at different companies because they found it useful, so I thought instead of having to send it manually, why not make it into a package?)

@joellabes
Copy link
Contributor

Works for me! It might be worth adding a bit more of that context to your readme to help folks understand the contexts where it's useful, but let's get this merged!

@joellabes joellabes merged commit 289962a into dbt-labs:main Sep 28, 2023
@BfdCampos
Copy link
Contributor Author

Perfect will do now!! Thanks so much @joellabes 🙌🙌

@BfdCampos
Copy link
Contributor Author

@joellabes apologies for the delay. This week has taken the better of me. But I have added the notes to the README as you suggested and made another release so that the package has the latest information. Thanks 😊

joellabes added a commit that referenced this pull request Oct 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants