You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Binary units of measurement express the size of data more accurately. When you compare the size of 100 KB to 100 KiB, the difference is relatively small, 2.35%. However, this difference grows as the size of the data values increases. When you compare the size of 100 TB to 100 TiB, the difference is 9.06% [1].
Removing them from dbt-core was mentioned here and here as part of the discussion for #48.
It feels to me like one of the following should have happened along with #48:
a simultaneous PR should have removed format_bytes and format_rows_number from dbt-core; or
#48 should have been added to dbt-core instead of dbt-bigquery.
As it stands, the implementations in dbt-core are unused (to the best that we can tell), but would probably need to be maintained to preserve backwards compatibility in case some unknown adapter is using them.
Proposal
Going forward, I'd propose that we choose one (and only one) of the following:
Keep separate implementations in dbt-core and dbt-bigquery (and fix the KB vs. KiB issue in both)
Remove the implementations in dbt-core (so they exist only in dbt-bigquery)
Completely move the implementation in dbt-bigquery back into dbt-core
The text was updated successfully, but these errors were encountered:
github-actionsbot
changed the title
[Bug] Non-standard abbreviations for kibibyte, etc
[CT-1892] [Bug] Non-standard abbreviations for kibibyte, etc
Jan 24, 2023
Is this a new bug in dbt-bigquery?
Context
IEC 80000-13
Current Behavior
Current code:
dbt-bigquery/dbt/adapters/bigquery/connections.py
Lines 258 to 269 in 3ce88d7
Expected Behavior
If we use logic like
num_bytes /= 1024.0
, then I'd expect it to align with IEC 80000-13 and use the following standardized abbreviations instead:(See the chart here for where I got the official abbreviations from.)
Alternatively, the current abbreviations for decimal multi-byte units could be kept and these substitutions could be made instead:
and:
Steps To Reproduce
Didn't actually reproduce -- just read the code!
Relevant log output
No response
Environment
Additional Context
label: tech_debt
Today I Learned (TIL),
format_bytes
andformat_rows_number
are defined in dbt-core!https://github.com/dbt-labs/dbt-core/blob/7b464b8a4957ec7969f19234020e110be1987923/core/dbt/utils.py#L456-L473
Context for the duplication
Removing them from dbt-core was mentioned here and here as part of the discussion for #48.
It feels to me like one of the following should have happened along with #48:
format_bytes
andformat_rows_number
from dbt-core; orAs it stands, the implementations in dbt-core are unused (to the best that we can tell), but would probably need to be maintained to preserve backwards compatibility in case some unknown adapter is using them.
Proposal
Going forward, I'd propose that we choose one (and only one) of the following:
The text was updated successfully, but these errors were encountered: