Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Union support to check_types: Bugfix/977 #995

Merged
merged 10 commits into from
Nov 11, 2022

Conversation

kr-hansen
Copy link
Contributor

Fixes #977

This PR adds support for Union[DataFrame[S1], DataFrame[S2]] in the check_types decorator. This was done by the following additions

  1. annotated_schema_models was updated to type of Dict[str, Iterable[Tuple[Union[SchemaModel, None], Union[AnnotationInfo, None]]]]. Specifically this could be a List[Tuple[]]but it was kept asIterableper the discussion in the issue. TheNoneswere added for mypy checking in the_check_arg` function, rather than turning off the mypy warning there.
  2. If the annotation_info for an annotation is determined to be generic, it is checked against the Union type. If it is a Union, all the annotations are looped through with model pairs being added the way they were outside. Otherwise the previous continue behavior was maintained if not a Union.
  3. The _check_arg function was updated to properly handle Union cases in the following ways:
    a. The annotation_model_pairs values are now looped through from the Iterable level.
    b. When a schema validates in this for loop, the same behavior is maintained and the arg_value should be returned after that initial validation.
    c. When a SchemaError is encountered in the except statement, it is passed to a SchemaErrorHandler object rather than being immediately raised. The next annotation_model_pair in the loop is moved to in searching for a validation.
    d. If nothing validates, then the errors passed to the SchemaErrorHandler get raised after looping through all options. A singular error gets raised in the same way it was raised before. If there are multiple errors they get raised as an errors.SchemaErrors object.
  4. The test_check_types_union_args() test was written to both validate inputs as expected and ensure there is a validation error on the outputs of functions with the check_types decorator. Open to suggestions on other tests I should add if there are other edge cases I seem to have added that aren't covered by these or previous tests.
  5. Two existing tests were altered (test_check_types_error_input and test_check_types_error_output) because when the SchemaError gets passed to the SchemaErrorHandler the .data object gets scrubbed. Hence the assertion that the input and validated data are equal was not able to pass anymore with the check_types decorator. The validation still occurs correctly, it is just not possible to ensure the returned data in the error is able to be checked against the input data. One option was to remove the data scrubbing by the SchemaErrorHandler, but that seemed to have the potential to make that object potentially large so the test was altered assuming that type of output is no longer available with the check_types decorator.

Note, this seems to have picked up a couple upstream changes that weren't made by me in the following files:
.github/workflows/ci-tests.yml
.pre-commit-config.yaml
docs/source/extensions.rst
environment.yml
pandera/extensions.py
requirements-dev.txt
tests/core/test_extensions.py
some of the decorator tests, specifically test_check_decorator_coercion, test_check_output_coercion_error, and test_check_instance_method_decorator_error

@cosmicBboy cosmicBboy changed the base branch from dev to master November 1, 2022 17:38
@codecov
Copy link

codecov bot commented Nov 1, 2022

Codecov Report

Base: 96.70% // Head: 96.85% // Increases project coverage by +0.15% 🎉

Coverage data is based on head (4326892) compared to base (38881b6).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #995      +/-   ##
==========================================
+ Coverage   96.70%   96.85%   +0.15%     
==========================================
  Files          42       42              
  Lines        4221     4296      +75     
==========================================
+ Hits         4082     4161      +79     
+ Misses        139      135       -4     
Impacted Files Coverage Δ
pandera/decorators.py 99.51% <100.00%> (+0.61%) ⬆️
pandera/typing/common.py 94.62% <100.00%> (+0.05%) ⬆️
pandera/io.py 100.00% <0.00%> (ø)
pandera/errors.py 100.00% <0.00%> (ø)
pandera/typing/__init__.py 100.00% <0.00%> (ø)
pandera/strategies.py 98.26% <0.00%> (+0.01%) ⬆️
pandera/dtypes.py 94.97% <0.00%> (+0.04%) ⬆️
pandera/schemas.py 99.42% <0.00%> (+0.15%) ⬆️
pandera/engines/pandas_engine.py 97.50% <0.00%> (+0.52%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@kr-hansen
Copy link
Contributor Author

kr-hansen commented Nov 2, 2022

Per the Codecov report (it won't let me see the details there), I added an additional test to hit the continue statements for non-dataframe typed inputs.

When checking locally for the test coverage on Python 3.9, it still wasn't catching the continue statement on pandera/typing/decorators.py line 605. However, one of the tests does hit that (confirmed with a breakpoint there). I think this was missed due to this issue in coverage. It looks like it should be solved in Python 3.10, but I'm not sure which version Codecov as you have setup uses.

EDIT: Looks like that worked :)

@cosmicBboy
Copy link
Collaborator

Thanks @kr-hansen! I'm workin on a DCO bot to make sure folks sign the DCO agreement (basically just signing the git commit) on each PR. I'll ping you here when it's ready (probably by tmrw/day after).

@kr-hansen
Copy link
Contributor Author

kr-hansen commented Nov 4, 2022

@cosmicBboy sounds good. I'll look forward to that. I've never done a DCO before. Does it have any verbiage around it I could pass on to our legal team for approval or is it just implied with it?

@cosmicBboy cosmicBboy closed this Nov 4, 2022
@cosmicBboy cosmicBboy reopened this Nov 4, 2022
@cosmicBboy
Copy link
Collaborator

hey @kr-hansen would you mind opening up a new PR to pick up on the DCO bot requirement?

Basically all you need to do is sign off your commits for this PR with the -s flag in git commit.

You may have to follow this guide in order to retroactivately sign off the commits in this PR.

Re: the verbiage, the DCO document is standardized, you send your legal team this: https://developercertificate.org/

Signed-off-by: kyle.hansen <15696062+kr-hansen@users.noreply.github.com>
Signed-off-by: kyle.hansen <15696062+kr-hansen@users.noreply.github.com>
Signed-off-by: kyle.hansen <15696062+kr-hansen@users.noreply.github.com>
@kr-hansen
Copy link
Contributor Author

@cosmicBboy Sounds good. That was pretty straightforward.

Note I opened a PR that I closed here because of the DCO. When doing the interactive mode, you need to do an edit to allow for an --amend flag, rather than a reword when rebasing.

I've passed along the DCO verbiage to my legal folks and will let you know when I hear back for sure to make certain all the Ts and Is are dotted and crossed. I'll respond here again after that OK and hopefully everything should be good to merge then.

@kr-hansen
Copy link
Contributor Author

@cosmicBboy My legal team is good with the DCO so things are cleared from my side to merge whenever. Once those checks run again and assuming they pass again I think we're good to go if you think my implementation makes sense.

@kr-hansen
Copy link
Contributor Author

All right @cosmicBboy. I think everything looks good to go on this from my side and it has passed all tests. Ping me again after you get a chance to review it and I can integrate any suggested changes here.

Copy link
Collaborator

@cosmicBboy cosmicBboy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work ✨!

The last thing we need is to update the documentation.

A header in this page would make sense, but I'll leave it up to you exactly where you think it makes sense. You can follow the conventions in the docs, but basically a description of Union types with SchemaModel with a few examples would be great to showcase:

  • happy path example
  • expected behavior in various failure scenarios (e.g. one of schemas fail vs. both fail)
  • (other caveat I might have missed?)

@cosmicBboy
Copy link
Collaborator

@kr-hansen let me know if you want to work on the docs in another PR! If so I can approve and merge this one

Signed-off-by: kyle.hansen <15696062+kr-hansen@users.noreply.github.com>
Signed-off-by: kyle.hansen <15696062+kr-hansen@users.noreply.github.com>
@kr-hansen
Copy link
Contributor Author

@cosmicBboy Thanks for the pointer to updating the docs. I actually found a minor bug with how built-in types were handled in a Union when putting together the doc examples.

I corrected that and updated the code, as well as pushed some examples to the documentation in the location you pointed to.

Let me know if there is anything else you think I should do or tweak with the documentation I added to get this ready for merging. I think my preference would be to merge the change with the docs updated appropriately as well.

Signed-off-by: kyle.hansen <15696062+kr-hansen@users.noreply.github.com>
@kr-hansen
Copy link
Contributor Author

@cosmicBboy So I found one additional bug my change last night introduced in the from_format portion that I didn't capture by re-running all the tests locally again. That should be fixed now. Sorry again for all the back and forth but this one should be good to go now I think.

import pandas as pd
import pandera as pa
from pandera.typing import DataFrame, Series
from typing import Union
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@kr-hansen kr-hansen Nov 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cosmicBboy Good catch. My mistake while re-running those earlier tests. I didn't test the documentation as robustly as I should have based on your developer docs and just worked in the .rst file raw. Fixed in 30188b5

kr-hansen and others added 2 commits November 10, 2022 11:36
Signed-off-by: kyle.hansen <15696062+kr-hansen@users.noreply.github.com>
Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com>
@cosmicBboy
Copy link
Collaborator

looks like errors being raised the docs examples: https://github.com/unionai-oss/pandera/actions/runs/3439620099/jobs/5737128731

@cosmicBboy
Copy link
Collaborator

perhaps we can adapt the docs examples into unit tests so that we cover those use cases in tests?

@kr-hansen
Copy link
Contributor Author

kr-hansen commented Nov 10, 2022

So I built the docs examples from the unit tests I added earlier.

This might come from my lack of knowledge around integrating running code in .rst files. So I followed the pattern in the .rst for this section which runs some code that works but returns an error in the case that doesn't work. The output I provided in the docs are the errors that should get returned and seem to match in the error stack traces. However, do you have to do something special in an .rst file to return an error like in the current docs to have it work ok?

I basically put the working examples in the docs, plus the areas I have in with pytest.raises() statements as the last call and printed out the error to give a sense of what to not do as well as what to do.

It's also odd to me it is only happening in Python 3.8 and 3.9, but not 3.7 or 3.10 or Windows all together...

Signed-off-by: kyle.hansen <15696062+kr-hansen@users.noreply.github.com>
@kr-hansen
Copy link
Contributor Author

kr-hansen commented Nov 10, 2022

So looking at it closer, it wasn't erroring on my first example of code which only returned the error I reported. For the second example I tried to print both some true statements and an error statement, which may have been what was causing the issue?

I just removed the second statement that was causing the mixed error from the docs now to see if it helps. Ultimately, I was initiating the same error as was provided in the first example which seems to work, but the output was a pydantic.ValidationError instead of a Pandera error because of the difference I was trying to show in that example with how Unions are handled in each way. I'm thinking that mixed with the standard print statements may have been what was causing the breakage on the docs, but we'll see.

@cosmicBboy
Copy link
Collaborator

Cool! let's see... btw you can also build the docs locally with make docs (might need to install the requirements-docs.txt file) so you can debug issues in rst code snippets

@kr-hansen
Copy link
Contributor Author

kr-hansen commented Nov 11, 2022

Sorry, I should have better read the documentation on how to contribute to the Docs prior to contributing to the docs 🤦 . Well building the docs locally in my 3.9 environment just worked locally, so I think my fix should work now for the 3.8 and 3.9 environments.

Side Note: I only ever tested in my local 3.9 environment for the other code as well and somewhat relied on the CI builds for the other environments. After I cloned and tried running all the tests from trunk, nox was never able to successfully complete all the tests on trunk on my local machine before I made any changes to the repo.

Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com>
@cosmicBboy
Copy link
Collaborator

looks great @kr-hansen ! gonna include this in the next 0.14.0 minor release. congrats on your first PR to pandera 🎉🚀

@kr-hansen
Copy link
Contributor Author

@cosmicBboy Any details/way to track when the next minor release may be coming out? Excited to start using this :).

@cosmicBboy
Copy link
Collaborator

hey @kr-hansen hoping to release 0.14.0 by the end of February!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pa.check_types doesn't seem to work with Union of multiple pandera types
2 participants