-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch from hologram to mashumaro jsonschema #8132
Conversation
Note: this requires changes to mashumaro, so it will not work correctly until those changes are released and we can pull them in. |
mashumaro tickets: Fatal1ty/mashumaro#126 |
Looks like all of the |
@Fatal1ty I've been on vacation and on other work, so sorry for not getting back to you. Looking forward to the release! |
Codecov Report
@@ Coverage Diff @@
## main #8132 +/- ##
==========================================
+ Coverage 86.23% 86.25% +0.01%
==========================================
Files 174 174
Lines 25518 25475 -43
==========================================
- Hits 22005 21973 -32
+ Misses 3513 3502 -11
|
It looks like this has much worse performance than the hologram json_schema generation. The main difference is that hologram cached the field lists in a class variable and the schemas in another class variable. There are a couple of things to try here: 1) caching the json_schemas in a class, 2) pre-generating the json_schemas and using them statically, 3) figuring out how to have mashumaro cache the fields. |
Closing this for now, and will continue work on the performance/caching issue in another ticket. This branch will be used as the basis for the additional work. |
resolves # 6776
Problem
We want to remove the dependency on hologram, which we have been using for jsonschema generation and validation.
Solution
Switch to using mashumaro jsonschema generation
Checklist
One of the main changes to functionality is that mashumaro uses the "alias" field metadata option to construct both "from_dict" and the jsonschema, so it automatically uses dashes for our many fields which use dashes in yaml and underscores in Python (mostly pre-hook and post-hook in configs and a bunch of fields in dbt_project). In addition, we are now setting a to_dict option to use the dash forms when serializing. This means that quite a bit of our kludges to handle the conversion to and from dashes are no longer necessary, but also means that in a couple of places we have to kludge in the other direction, i.e. from_dict and to_dict will no longer handle or produce the underscore versioned names.
Code to convert to and from pre-hook/post-hook has been removed from pre_serialize and post_serialize methods in model_config.py Code to convert to and from dashes/underscores has been removed from methods in dbtClassMixin. The HyphenatedDbtClassMixin and _hypeneated class variable have been removed since they are no longer required.
All fields that are hyphenated in yaml have add an "alias" field metadata added.
Some extra code has been added to convert pre_hook and post_hook in dbt_project to pre-hook and post-hook, because doing that was specifically enabled by an earlier pull request.
The "register_pattern" method used by hologram has been removed and fields that validated via pattern have been "Annotated" with a Pattern.
The "resource_type" in various node classes had a "restrict" field metadata which was used by hologram to construct jsonschemas. The fields have been changed to a Literal, which results in a "const" field definition in jsonschema, which seems to work fine.
New "json_schema" and "validate" methods have been added to dbtClassMixin to replace the ones that came from hologram.
The "_get_fields" and "_get_field_names" methods that were in hologram and were used by model_config.py have been copied and updated to use field aliases instead of the "field_mapping" dictionaries.
The "Port" field used in the Connection object has been changed back to the original NewType because mashumaro now supports NewType and it was causing problems in json_schema generation.
The PortEncoder class which was used by hologram has been replaced by field Annotations on the "port" field.
The TimeDeltaFieldEncoder and PathEncoder fields don't appear to have been used and have been removed.
We changed the dependency from hologram to jsonschema.
The manifest schema version was bumped to 11 and a schema generated. At this point the generated schema doesn't use field definitions because there is a bug when doing that so it's quite a bit larger, but hopefully that will be addressed before this pr is closed.