-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert to using mashumaro jsonschema with acceptable performance #8437
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #8437 +/- ##
=======================================
Coverage 86.34% 86.34%
=======================================
Files 174 174
Lines 25579 25531 -48
=======================================
- Hits 22087 22046 -41
+ Misses 3492 3485 -7
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
) | ||
pre_hook: List[Hook] = field( | ||
default_factory=list, | ||
metadata=MergeBehavior.Append.meta(), | ||
metadata={"merge": MergeBehavior.Append, "alias": "pre-hook"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are aliases needed for Append now? Why does packages not need it on line 466?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "alias" is for handling the dashes in the names properly. Most of the other field definitions use that kind of hacky metadata=MergeBehavior.DictKeyAppend.meta() thing, which doesn't allow setting additional metadata.
@@ -72,12 +72,12 @@ | |||
# ---- | |||
# These are major-version-0 packages also maintained by dbt-labs. Accept patches. | |||
"dbt-extractor~=0.5.0", | |||
"hologram~=0.0.16", # includes transitive dependencies on python-dateutil and jsonschema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎊
@jtcohen6 @graciegoheen I tagged you on this because the output jsonschema is different than what was generated by hologram in a number of ways. Do people actually read it? Do we have any concerns there? For example, the resource_type shows up as a const, and the use of OneOf vs AnyOf is different. |
@gshank I think that's fine, as long as this is a forward-looking change for new versions of We know that the jsonschemas generated by hologram were not always even technically correct, which could lead to edge cases if used for programmatic validation (e.g. #4657). I am hoping that the ones produced by I just want to clarify that:
cc @dbt-labs/cloud-artifacts for visibility |
That's right, the other schemas will change too. Should I update the other schemas too? Nothing has probably changed as far as validation... Or should we wait for an actual change and just verify that newly generated schemas still work? |
@gshank Good point re: artifacts that won't actually be changing their schema in v1.7 (most likely |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few questions, no serious concerns, and multiple nits (take them or leave them, just things I noticed).
return updated | ||
|
||
def translate_hook_names(self, project_dict): | ||
# This is a kind of kludge because the fix for #6411 specifically allowed misspelling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is not the intended input format, should we raise a warning here indicating that? I wouldn't cause anything to fail, but providing some direction would make it easier for us to deprecate the incorrect spelling in the future (likely one less thing for folks to change for 2.0).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That ticket specifically allowed the "incorrect" spellings, so it's now a feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no :lolsob: emoji, why is there no :lolsob: emoji when I need one so badly.
That being said, we don't intend on ever migrating folks off of the "incorrect" spelling either?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd have to ask product and Doug :). If you want to open a ticket, go ahead. Not in scope for this one though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The misspelling here we mean is, we'll accept either kebab case or snake case for these two configs, in the several places they could be potentially defined:
post-hook
orpost_hook
pre-hook
orpre_hook
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The misspelling here we mean is, we'll accept either kebab case or snake case for these two configs
Agreed, I'm asking if we ever want to back out of that ditch, or support that for the foreseeable future.
|
||
# Check that catalog validates with jsonschema | ||
catalog_dict = catalog.to_dict() | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't explain it, but this feels like an odd flow to me. Would something like this work?
assert catalog.validate(catalog_dict), "Catalog validation failed"
or even
assert catalog.validate(catalog.to_dict()), "Catalog validation failed"
@@ -81,6 +82,10 @@ def _assert_freshness_results(self, path, state): | |||
with open(path) as fp: | |||
data = json.load(fp) | |||
|
|||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as in test_docs_generate_defer
.
* Add compiled node properties to run_results.json * Include compiled-node attributes in run_results.json * Fix typo * Bump schema version of run_results * Fix test assertions * Update expected run_results to reflect new attributes * Code review changes * Fix mypy warnings for ManifestLoader.load() (#8443) * revert python version for docker images (#8445) * revert python version for docker images * add comment to not update python version, update changelog * Bumping version to 1.7.0b1 and generate changelog * [CT-3013] Fix parsing of `window_groupings` (#8454) * Update semantic model parsing tests to check measure non_additive_dimension spec * Make `window_groupings` default to empty list if not specified on `non_additive_dimension` * Add changie doc for `window_groupings` parsing fix * update `Number` class to handle integer values (#8306) * add show test for json data * oh changie my changie * revert unecessary cahnge to fixture * keep decimal class for precision methods, but return __int__ value * jerco updates * update integer type * update other tests * Update .changes/unreleased/Fixes-20230803-093502.yaml --------- Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com> * Improve docker image README (#8212) * Improve docker image README - Fix unnecessary/missing newline escapes - Remove double whitespace between parameters - 2-space indent for extra lines in image build commands * Add changelog entry for #8212 * ADAP-814: Refactor prep for MV updates (#8459) * apply reformatting changes only for #8449 * add logging back to get_create_materialized_view_as_sql * changie * swap trigger (#8463) * update the implementation template (#8466) * update the implementation template * add colon * Split tests into classes (#8474) * add flaky decorator * split up tests into classes * revert update agate for int (#8478) * updated typing and methods to meet mypy standards (#8485) * Convert error to conditional warning for unversioned contracted model, fix msg format (#8451) * first pass, tests need updates * update proto defn * fixing tests * more test fixes * finish fixing test file * reformat the message * formatting messages * changelog * add event to unit test * feedback on message structure * WIP * fix up event to take in all fields * fix test * Fix ambiguous reference error for duplicate model names across packages with tests (#8488) * Safely remove external nodes from manifest (#8495) * [CT-2840] Improved semantic layer protocol satisfaction tests (#8456) * Test `SemanticModel` satisfies protocol when none of it's `Optionals` are specified * Add tests ensuring SourceFileMetadata and FileSlice satisfiy DSI protocols * Add test asserting Defaults obj satisfies protocol * Add test asserting SemanticModel with optionals specified satisfies protocol * Split dimension protocol satisfaction tests into with and without optionals * Simplify DSI Protocol import strategy in protocol satisfaction tests * Add test asserting DimensionValidtyParams satisfies protocol * Add test asserting DimensionTypeParams satisfies protocol * Split entity protocol satisfaction tests into with and without optionals * Split measure protocol satisfication tests and add measure aggregation params satisficaition test * Split metric protocol satisfaction test into optional specified an unspecified Additionally, create where_filter pytest fixture * Improve protocol satisfaction tests for MetricTypeParams and sub protocols Specifically we added/improved protocol satisfaction tests for - MetricTypeParams - MetricInput - MetricInputMeasure - MetricTimeWindow * Convert to using mashumaro jsonschema with acceptable performance (#8437) * Regenerate run_results schema after merging in changes from main. --------- Co-authored-by: Gerda Shank <gerda@dbtlabs.com> Co-authored-by: Matthew McKnight <91097623+McKnight-42@users.noreply.github.com> Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com> Co-authored-by: Quigley Malcolm <QMalcolm@users.noreply.github.com> Co-authored-by: dave-connors-3 <73915542+dave-connors-3@users.noreply.github.com> Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com> Co-authored-by: Jaime Martínez Rincón <jaime@jamezrin.name> Co-authored-by: Mike Alfare <13974384+mikealfare@users.noreply.github.com> Co-authored-by: Michelle Ark <MichelleArk@users.noreply.github.com>
resolves #8426
Problem
Original conversion performed in #8132, but with performance issues. Use caching to improve performance.
See the comments in #8132 for additional context for code reviews.
Checklist