Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package best practices doc #139

Merged
merged 8 commits into from
Aug 3, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
First draft of package best practices
  • Loading branch information
joellabes committed Jul 28, 2022
commit e8f3e827919db0390eb102fb9e19cd8383668938
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@ Assorted constraints:
### Adding your package to hubcap
Currently, only packages hosted on a GitHub repo are supported.

To add your package, open a PR that adds your repository to [hub.json](hub.json).
To add your package, open a PR that adds your repository to [hub.json](hub.json). A dbt Labs team member will review your PR and provide a cursory check of your new package against [best practices](package-best-practices.md).
joellabes marked this conversation as resolved.
Show resolved Hide resolved
41 changes: 41 additions & 0 deletions package-best-practices.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Best practices when submitting a package for public use

Packages are a key part of the dbt experience. Practitioners and vendors alike can [contribute to the knowledge loop](https://github.com/dbt-labs/corp/blob/main/values.md#we-contribute-to-the-knowledge-loop) by enabling compelling, batteries-included use cases.
joellabes marked this conversation as resolved.
Show resolved Hide resolved

There are hundreds of packages on the dbt Package Hub already, and over time we have identified a handful of common mistakes that new package authors make as they take a self-contained package and prepare it for publication. We have listed these below to help you and your users get the best experience.

Although the dbt Labs team completes a cursory review of new packages before adding them to the Package Hub, ultimately you are responsible for your package's performance.

To avoid ambiguity, the bullets below are pretty formal, but we are a friendly bunch! If you need help preparing a package for deployment to the dbt Package Hub, reach out in [#package-ecosystem](https://getdbt.slack.com/archives/CU4MRJ7QB/) in the [dbt Community Slack](https://getdbt.com/community).
joellabes marked this conversation as resolved.
Show resolved Hide resolved

---

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119).

### First run experience
- Packages SHOULD contain a README file which explains how to get started with the package and customise its behaviour.
joellabes marked this conversation as resolved.
Show resolved Hide resolved

### Customisability
- Packages MUST NOT hard-code table references, and MUST use `ref` or `source` instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking commentary.

The current text definitely conveys the spirit of the guidance, but an overly lawyerly maintainer could hard-code a view reference and claim full compliance.

Two opportunities for further clarity:

  • table → relation
  • use applicable adapter methods rather than directly utilizing describe, show, or information_schema

I'm fuzzy on the detailed guidance to accomplish the latter, so that's why I don't have an explicit suggestion that covers both opportunities.

Our glossary doesn't include an entry "relation" currently, it only has the following disparate entries:

The closest documentation I could find quickly focuses on the definition and usage of the Relation Python object:

- Packages MAY provide a mechanism (such as variables) to enable/disable certain subsets of functionality.
- Packages for data transformation:
- SHOULD provide a mechanism (such as variables) to customise the location of source tables.
- SHOULD NOT assume database/schema names in sources.
- MAY assume table names, particularly if the package was built to support tables created by a known tool.

### Dependencies
- Packages requiring a minimum version of `dbt-core` MUST declare it in the `require-dbt-version` property of `dbt_project.yml`.
- Packages requiring a minimum version of `dbt-core` SHOULD declare compatibility with all subsequent minor and patch releases of that major version.
- For example, a package which depends on functionality added in dbt Core 1.2 SHOULD set its `require-dbt-version` property to `[">=1.2.0", "<2.0.0"]`.
Packages with a dependency on another package:
- MAY, if the dependent package is major-version-zero, pin to the current minor version only.
dbeatty10 marked this conversation as resolved.
Show resolved Hide resolved
- SHOULD NOT pin to a patch version of the dependent package unless they are aware of an incompatibility between their package and the dependency.
- SHOULD reference the version in the dbt Package Hub when available, as opposed to a `git` installation.

### Interoperability
- Packages MUST NOT override dbt Core behaviour in such a way as to impact other dbt resources (models, tests, etc) not provided by the package.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtcohen6 I wrote this thinking about the aftermath of that Fivetran package that overrode freshness collection and had unfortunate side effects.

But, if someone like this community member turned their gist into a package, that would be an acceptable overriding of behaviour yeah? Does this rule need a bit of wordsmithing or do we just drop it down to REALLY SHOULD NOT per RFC 6919?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for REALLY SHOULD NOT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REALLY SHOULD NOT doesn't actually exist 😢 (discussed above) - should've made clearer that this was a joke

- Packages SHOULD disambiguate their resource names to avoid clashes with nodes that are likely to already exist in a project.
- For example, packages SHOULD NOT provide a model simply called `users`.

### Updates
- Packages SHOULD follow the guidance of the [Semantic Versioning Specification](https://semver.org/).
joellabes marked this conversation as resolved.
Show resolved Hide resolved
joellabes marked this conversation as resolved.
Show resolved Hide resolved