Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2176] [Feature] Report bytes processed for tests #559

Closed
3 tasks done
bruno-szdl opened this issue Feb 24, 2023 · 2 comments
Closed
3 tasks done

[CT-2176] [Feature] Report bytes processed for tests #559

bruno-szdl opened this issue Feb 24, 2023 · 2 comments
Labels
feature:cost-reduction Issues related to cost tracking in BigQuery type:enhancement New feature or request

Comments

@bruno-szdl
Copy link
Contributor

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

As @elyobo commented in this closed issue #14 (comment), it would be very nice to see the bytes processed by running tests in BigQuery.

Tests can consume a lot of bytes, especially when running tests like unique and not_null without where, where they scan the whole table.

Since BQ billing is based on these bytes, you can get surprised by an expensive bill because you have no idea of the costs of your tests when developing. So, it would be fantastic to see the cost of each test in the logs as we see for the models.

Describe alternatives you've considered

Currently, the information of bytes_processed for models is taken from total_bytes_processed from the query_job object.
https://github.com/dbt-labs/dbt-bigquery/blob/main/dbt/adapters/bigquery/connections.py#L491

I am assuming the query_job information is queried from INFORMATION_SCHEMA.JOBS. I checked it and jobs that run tests also return a value for total_bytes_processed.

I don't understand why in run_results.json tests don't return the bytes processed, as I didn't find any restriction in this code
https://github.com/dbt-labs/dbt-bigquery/blob/main/dbt/adapters/bigquery/connections.py
Maybe I am looking at the wrong code.

Anyway, the information is available and we already do this for models, so I think we can do this for tests.

Who will this benefit?

People working with dbt in BigQuery as they can monitor better the costs caused by dbt tests.

Are you interested in contributing this feature?

Yes, I am interested. I am not sure where the information for test results is written.

Anything else?

No response

@bruno-szdl bruno-szdl changed the title [Feature] Report bytes processed for tests [Feature] Report bytes processed by running tests Feb 24, 2023
@github-actions github-actions bot changed the title [Feature] Report bytes processed by running tests [CT-2176] [Feature] Report bytes processed for tests Feb 24, 2023
@jtcohen6
Copy link
Contributor

Good news! Starting in v1.5 (currently available as a beta prerelease), this information will also be available for tests and source freshness checks, thanks to this contribution:

You can try it out!

pip install dbt-bigquery~=1.5.0b1

@bruno-szdl
Copy link
Contributor Author

great! I am trying to better monitor the costs of BigQuery projects of our clients and that was exactly what I needed! Thanks @jtcohen6 for showing this contribution.

I will close this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature:cost-reduction Issues related to cost tracking in BigQuery type:enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants