-
-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Union support to check_types
: Bugfix/977
#995
Conversation
Codecov ReportBase: 96.70% // Head: 96.85% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #995 +/- ##
==========================================
+ Coverage 96.70% 96.85% +0.15%
==========================================
Files 42 42
Lines 4221 4296 +75
==========================================
+ Hits 4082 4161 +79
+ Misses 139 135 -4
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Per the Codecov report (it won't let me see the details there), I added an additional test to hit the continue statements for non-dataframe typed inputs. When checking locally for the test coverage on Python 3.9, it still wasn't catching the continue statement on EDIT: Looks like that worked :) |
Thanks @kr-hansen! I'm workin on a DCO bot to make sure folks sign the DCO agreement (basically just signing the git commit) on each PR. I'll ping you here when it's ready (probably by tmrw/day after). |
@cosmicBboy sounds good. I'll look forward to that. I've never done a DCO before. Does it have any verbiage around it I could pass on to our legal team for approval or is it just implied with it? |
hey @kr-hansen would you mind opening up a new PR to pick up on the DCO bot requirement? Basically all you need to do is sign off your commits for this PR with the You may have to follow this guide in order to retroactivately sign off the commits in this PR. Re: the verbiage, the DCO document is standardized, you send your legal team this: https://developercertificate.org/ |
Signed-off-by: kyle.hansen <15696062+kr-hansen@users.noreply.github.com>
Signed-off-by: kyle.hansen <15696062+kr-hansen@users.noreply.github.com>
Signed-off-by: kyle.hansen <15696062+kr-hansen@users.noreply.github.com>
@cosmicBboy Sounds good. That was pretty straightforward. Note I opened a PR that I closed here because of the DCO. When doing the interactive mode, you need to do an I've passed along the DCO verbiage to my legal folks and will let you know when I hear back for sure to make certain all the Ts and Is are dotted and crossed. I'll respond here again after that OK and hopefully everything should be good to merge then. |
@cosmicBboy My legal team is good with the DCO so things are cleared from my side to merge whenever. Once those checks run again and assuming they pass again I think we're good to go if you think my implementation makes sense. |
All right @cosmicBboy. I think everything looks good to go on this from my side and it has passed all tests. Ping me again after you get a chance to review it and I can integrate any suggested changes here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work ✨!
The last thing we need is to update the documentation.
A header in this page would make sense, but I'll leave it up to you exactly where you think it makes sense. You can follow the conventions in the docs, but basically a description of Union types with SchemaModel with a few examples would be great to showcase:
- happy path example
- expected behavior in various failure scenarios (e.g. one of schemas fail vs. both fail)
- (other caveat I might have missed?)
@kr-hansen let me know if you want to work on the docs in another PR! If so I can approve and merge this one |
Signed-off-by: kyle.hansen <15696062+kr-hansen@users.noreply.github.com>
Signed-off-by: kyle.hansen <15696062+kr-hansen@users.noreply.github.com>
37609d0
to
c20267b
Compare
@cosmicBboy Thanks for the pointer to updating the docs. I actually found a minor bug with how built-in types were handled in a Union when putting together the doc examples. I corrected that and updated the code, as well as pushed some examples to the documentation in the location you pointed to. Let me know if there is anything else you think I should do or tweak with the documentation I added to get this ready for merging. I think my preference would be to merge the change with the docs updated appropriately as well. |
Signed-off-by: kyle.hansen <15696062+kr-hansen@users.noreply.github.com>
@cosmicBboy So I found one additional bug my change last night introduced in the |
docs/source/schema_models.rst
Outdated
import pandas as pd | ||
import pandera as pa | ||
from pandera.typing import DataFrame, Series | ||
from typing import Union |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kr-hansen why de-indent this? It won't render properly: https://pandera--995.org.readthedocs.build/en/995/schema_models.html#validate-against-multiple-schemas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cosmicBboy Good catch. My mistake while re-running those earlier tests. I didn't test the documentation as robustly as I should have based on your developer docs and just worked in the .rst
file raw. Fixed in 30188b5
Signed-off-by: kyle.hansen <15696062+kr-hansen@users.noreply.github.com>
Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com>
looks like errors being raised the docs examples: https://github.com/unionai-oss/pandera/actions/runs/3439620099/jobs/5737128731 |
perhaps we can adapt the docs examples into unit tests so that we cover those use cases in tests? |
So I built the docs examples from the unit tests I added earlier. This might come from my lack of knowledge around integrating running code in I basically put the working examples in the docs, plus the areas I have in It's also odd to me it is only happening in Python 3.8 and 3.9, but not 3.7 or 3.10 or Windows all together... |
Signed-off-by: kyle.hansen <15696062+kr-hansen@users.noreply.github.com>
So looking at it closer, it wasn't erroring on my first example of code which only returned the error I reported. For the second example I tried to print both some true statements and an error statement, which may have been what was causing the issue? I just removed the second statement that was causing the mixed error from the docs now to see if it helps. Ultimately, I was initiating the same error as was provided in the first example which seems to work, but the output was a pydantic.ValidationError instead of a Pandera error because of the difference I was trying to show in that example with how |
Cool! let's see... btw you can also build the docs locally with |
Sorry, I should have better read the documentation on how to contribute to the Docs prior to contributing to the docs 🤦 . Well building the docs locally in my 3.9 environment just worked locally, so I think my fix should work now for the 3.8 and 3.9 environments. Side Note: I only ever tested in my local 3.9 environment for the other code as well and somewhat relied on the CI builds for the other environments. After I cloned and tried running all the tests from trunk, nox was never able to successfully complete all the tests on trunk on my local machine before I made any changes to the repo. |
Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com>
looks great @kr-hansen ! gonna include this in the next |
@cosmicBboy Any details/way to track when the next minor release may be coming out? Excited to start using this :). |
hey @kr-hansen hoping to release |
Fixes #977
This PR adds support for
Union[DataFrame[S1], DataFrame[S2]]
in thecheck_types
decorator. This was done by the following additionsannotated_schema_models
was updated to type ofDict[str, Iterable[Tuple[Union[SchemaModel, None], Union[AnnotationInfo, None]]]]. Specifically this could be a
List[Tuple[]]but it was kept as
Iterableper the discussion in the issue. The
Noneswere added for mypy checking in the
_check_arg` function, rather than turning off the mypy warning there.annotation_info
for an annotation is determined to be generic, it is checked against the Union type. If it is a Union, all the annotations are looped through with model pairs being added the way they were outside. Otherwise the previouscontinue
behavior was maintained if not a Union._check_arg
function was updated to properly handle Union cases in the following ways:a. The annotation_model_pairs values are now looped through from the
Iterable
level.b. When a schema validates in this for loop, the same behavior is maintained and the
arg_value
should be returned after that initial validation.c. When a
SchemaError
is encountered in theexcept
statement, it is passed to aSchemaErrorHandler
object rather than being immediately raised. The next annotation_model_pair in the loop is moved to in searching for a validation.d. If nothing validates, then the errors passed to the
SchemaErrorHandler
get raised after looping through all options. A singular error gets raised in the same way it was raised before. If there are multiple errors they get raised as anerrors.SchemaErrors
object.test_check_types_union_args()
test was written to both validate inputs as expected and ensure there is a validation error on the outputs of functions with thecheck_types
decorator. Open to suggestions on other tests I should add if there are other edge cases I seem to have added that aren't covered by these or previous tests.test_check_types_error_input
andtest_check_types_error_output
) because when theSchemaError
gets passed to theSchemaErrorHandler
the.data
object gets scrubbed. Hence the assertion that the input and validated data are equal was not able to pass anymore with thecheck_types
decorator. The validation still occurs correctly, it is just not possible to ensure the returned data in the error is able to be checked against the input data. One option was to remove the data scrubbing by theSchemaErrorHandler
, but that seemed to have the potential to make that object potentially large so the test was altered assuming that type of output is no longer available with thecheck_types
decorator.Note, this seems to have picked up a couple upstream changes that weren't made by me in the following files:
.github/workflows/ci-tests.yml
.pre-commit-config.yaml
docs/source/extensions.rst
environment.yml
pandera/extensions.py
requirements-dev.txt
tests/core/test_extensions.py
some of the decorator tests, specifically
test_check_decorator_coercion
,test_check_output_coercion_error
, andtest_check_instance_method_decorator_error