Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some typing to the exceptions.py module #1019

Merged
merged 3 commits into from
Jun 5, 2024

Conversation

DanielNoord
Copy link
Contributor

@DanielNoord DanielNoord commented Nov 26, 2022

After getting the go-ahead in #1017 I thought I would test the waters both in size and scope of potential PRs.

Feel free to ask me to separate some of the commits into separate PRs. I thought this was quite manageable, especially if you go commit by commit.


I thought I would start with a fairly self contained module (and a module I interact with quite a lot with at work myself).
I think most of the changes are pretty straight forward. The first commit is hardest to review and for now I focused on the attributes that I was most sure about, keeping harder ones for later.

The second and third commit "fix issues". I understand that these are not actual issues and that through the dynamicness of Python these code paths actually currently work. So: what is the desired action here? Add ignores to tell mypy we know this works? Or add some isinstance checks and attribute definitions to make the code actually type safe. I lean towards the latter but don't want to impose my personal preference on a project I am not involved with until now. I'm more than happy to change those commits to do something differenty!

Happy reviewing 😄

Edit: Sorry for all the force pushes, after a helpful comment from @ AlexWaygood in the typeshed repo I found one issue with my initial PR.


📚 Documentation preview 📚: https://python-jsonschema--1019.org.readthedocs.build/en/1019/

@codecov-commenter
Copy link

codecov-commenter commented Nov 26, 2022

Codecov Report

Base: 98.22% // Head: 98.15% // Decreases project coverage by -0.06% ⚠️

Coverage data is based on head (477caf6) compared to base (09b28bb).
Patch coverage: 90.00% of modified lines in pull request are covered.

❗ Current head 477caf6 differs from pull request most recent head 7f0f5e0. Consider uploading reports for the commit 7f0f5e0 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1019      +/-   ##
==========================================
- Coverage   98.22%   98.15%   -0.07%     
==========================================
  Files          20       20              
  Lines        3545     3526      -19     
  Branches      537      537              
==========================================
- Hits         3482     3461      -21     
- Misses         47       48       +1     
- Partials       16       17       +1     
Impacted Files Coverage Δ
jsonschema/_utils.py 98.20% <ø> (ø)
jsonschema/exceptions.py 97.04% <90.00%> (-1.11%) ⬇️
jsonschema/tests/_suite.py 90.00% <0.00%> (-1.43%) ⬇️
jsonschema/_format.py 94.32% <0.00%> (-0.08%) ⬇️
jsonschema/tests/test_validators.py 98.99% <0.00%> (-0.01%) ⬇️
jsonschema/protocols.py 88.00% <0.00%> (ø)
jsonschema/validators.py 96.86% <0.00%> (ø)
jsonschema/tests/test_jsonschema_test_suite.py 100.00% <0.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@DanielNoord
Copy link
Contributor Author

DanielNoord commented Nov 26, 2022

I have spent way too much time getting the documentation to build 😅

autoapi and autodoc really don't work well with signature typing. They can't follow imports to ignored files so anything from _utils is unfound. This could be solved by also including those files, but then we get warnings about _RaisesType not being found:

_RaisesType = typing.Union[

It seems silly that this should be so difficult but whatever I tried I couldn't get it to find all references successfully...

Edit: After some other failed attempts this now finally seems to pass everything on CI not only locally... Turned out to be a little bigger than I had hoped. Sorry for the git spam.

@DanielNoord DanielNoord force-pushed the typing/base branch 2 times, most recently from 493f11e to 7aa8068 Compare November 26, 2022 19:04
Copy link
Member

@Julian Julian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! (And sorry, yes, tools love fighting with each other at this point...)

I left some initial comments coming back from Thanksgiving break. Will have to take a closer look in a bit, so probably will do a second pass. What I left so far is mostly surface level though there are some minor type tweaks.

Definitely appreciated!

WEAK_MATCHES: frozenset[str] = frozenset(["anyOf", "oneOf"])
STRONG_MATCHES: frozenset[str] = frozenset()

_unset = _utils.Unset()


class _Error(Exception):

_word_for_schema_in_error_message: str | None = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be None, just str, though I guess I see you're adding it to the (never-instantiated _Error). But yeah can we just leave it at str?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used None as for most _Errors this is indeed None.

Ideally we would create a private subclass of _Error that has this defined as str and which can then be subclassed by the actual errors that have this attribute. But since I didn't want to introduce new subclasses without approval I thought I would propose this first. Would new subclasses for the classes that use this attribute be okay?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Can you elaborate on what you mean by most _Errors having this be none?

I could be misremembering my own library, but every "encountered" instance of _Error should be of a concrete subclass (i.e. ValidationError or SchemaError), both of which define these class attributes appropriately, so no end-user should ever see an _Error where it's anything but str. Am I missing what you mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the __str__ implementation that uses these attributes is on _Error, even though the attribute will only ever be defined on two of its subclasses. This would always throw up any typechecker as _Error.__str__ uses attributes that can not always be accessed.
This can be avoided by either adding the attribute to _Error as being None unless they are set in the two subclasses that do, or have an inbetween subclass that uses the attributes and remove the attributes from _Error.__str__

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire purporse of __str__ is just to code-share between the two concrete implementations, so I'm not sure I follow what you mean by introducing a(nother in-between) subclass, it would have the same issue you're mentioning.

I definitely understand the bit about a type checker not being happy with __str__ though, but yeah as I've said elsewhere, I'm not keen on making any runtime changes to satisfy the type checker, so if there's a way to just tell it to ignore the attributes great, if not I suspect an even better solution (especially given the number of exceptions is likely to grow) is going to be turning _Error into a factory function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make an example PR for the sub classes example. However, as long as the documentation doesn't build I don't think it makes a lot of sense to pursue these changes anyway. Some help with that PR would be appreciated (but I don't want to rush you, everybody has other priorities than OSS).

@@ -142,12 +159,18 @@ def _contents(self):
)
return dict((attr, getattr(self, attr)) for attr in attrs)

def _matches_type(self):
def _matches_type(self) -> bool:
if isinstance(self.schema, _utils.Unset):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer adding type: ignores rather than any runtime things here yeah.

It's a programmer error to get here if schema is somehow unset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

if any(m is _unset for m in essential_for_verbose):
if (
any(m is _unset for m in essential_for_verbose)
or self._word_for_schema_in_error_message is None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Same here, I'd rather not add this, _Error is not actually meant to be instantiated, it's just an interface, though a messy one clearly with concrete behavior in it. Some day maybe it should instead be a function.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above!

path=(),
cause=None,
message: str,
validator: str | _utils.Unset = _unset,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently I don't have ReadTheDocs preview builds enabled (ugh), but I'm worried all these will show up in documentation, when they shouldn't -- the user facing type of validator is str. It's an internal implementation detail that it can be unset for the time validation is still ongoing, but I don't want users to see that, and it's not a part of its public API. Will build the docs to see what shows up (and probably enable preview builds).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this PR the typing no longer shows up as adding the typing annotations completely breaks the documentation builds.

I understand that it should be str but there is really no way to satisfy a type checker when you are assigning _unset to something that is annotated with str. Is it really that important that Unset doesn't leak? I found it quite easy to reason about when I first saw it and understood that validator should be str unless I'm breaking the package 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this PR the typing no longer shows up as adding the typing annotations completely breaks the documentation builds.

Hm. As in totally disables it displaying in documentation at all?

I definitely want it showing up there, it's really the only reason I care about typing at all, though I definitely feel the tooling pain with these tools, but yeah that's really the primary motivation for typing info, so I'd be worried if that's totally disabled now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that autoapi doesn't understand typing aliases... So even if I get all other references to resolve correctly it will still complain about:

_RaisesType = typing.Union[

This is used in a signature and based on the documentation that autoapi builds any reference to _RaisesType will cause a warning. That's why in the end I disabled the typing from showing up in the docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readthedocs/sphinx-autoapi#224 it looks like is maybe the/a relevant issue -- but yeah to be upfront without trying to waste any of your time I definitely would consider not showing typing info in the docs to be a non-starter unfortunately, as I say, it's the main thing I care about :/ so it may require some thinking if indeed stuff doesn't work there before merging additional changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's indeed the issue. Would just using autodoc instead of autoapi be okay? You'd need to manually rewrite the index whenever you write a new module, but I think autodoc should be able to find all relevant classes needed to create the documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to the idea (it's how I used to do things before autoapi). If sphinx-apidoc produces the right output with minimal modifications I'm even more open to it. But yeah :) if the diff is reasonable then yes.

schema_path=(),
parent=None,
type_checker=_unset,
instance: dict[str, Any] | _utils.Unset = _unset,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instance is type Any (any Python object can be validated).

parent=None,
type_checker=_unset,
instance: dict[str, Any] | _utils.Unset = _unset,
schema: dict[str, Any] | _utils.Unset = _unset,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schema is at the minute type Mapping[str, Any] | bool.


Zac
HD

Berman
Libera
GPL

# json properties
additionalProperties
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is somehow autodoc pulling things in again? Basically none of these should really be here, because none of them are really words, they should always either be surrounded in backticks (which will disable spell-checking them) or be used in :kw:foo`` roles. But autodoc can be funny there and find these words in documents we don't control...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try one more time to fix this 😄

I have an idea that might work, but I can't promise much. It is indeed autodoc and autoapi pulling more stuff in.

@DanielNoord
Copy link
Contributor Author

Rebased on main and ready for review again!

validator=_unset,
path=(),
cause=None,
validator: str | _utils.Unset = _unset,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these _unsets still need removing / # type: ignore'ing as per the above, I still don't want them appearing in docs (I don't want end-users knowing _unset exists at all even).

Copy link
Contributor Author

@DanielNoord DanielNoord Feb 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for never getting back on this, life became busy.

I don't really know how to resolve this as I think it would start to become really hard to add typing to the package itself if we were to add type: ignore everywhere. If it is not part of the public API then it will be hard to "hide" it from users if it is also being used in parts that are exposed to the public API (such as the exception classes).

I'd be very happy to help add typing and refactor some code so that internal details don't get exposed, but adding incorrect typing and then adding type: ignore seems to do more harm than good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update and no worries on the delay, sorry for my own.

To be direct, kind of like #1027 (comment) I'm not sure I'm ready to merge this as-is in that case -- I have to kind of stew on it a bit. I've hopefully said a few times before, I think for documentation and editor support that I'm quite happy to merge or add typing related fixes, but beyond the two I really honestly think it's all overall more annoyance than it helps still, so while I'm slightly sympathetic, I will always still go with the former (docs) over anything else -- I want to get some $ref related functional fixes out, which will likely be another few weeks, but then I can perhaps think a bit more about whether it's possible to get an improvement that I'm happier with. Just being transparent to not waste any of your time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you as well for the update!

I think for documentation and editor support that I'm quite happy to merge or add typing related fixes, but beyond the two I really honestly think it's all overall more annoyance than it helps still

I'm not sure this will convince you, but note that in my company we have more use cases. We have quite a bit of type: ignores in our code where we know that the typing stubs of jsonschema in typeshed are incorrect and where mypy warns us that we are making typing errors.
Having the typing more directly assosicated with the package would expose some implementation details, but at least the typing would be correct and we (and mypy) can rely on that.

Just being transparent to not waste any of your time.

Thanks for this either way!

@DanielNoord
Copy link
Contributor Author

Is there any interest in this PR? Or shall I just close it?

@Julian
Copy link
Member

Julian commented Jun 5, 2024

Apologies for letting this falling off the map. I'll have a look today. I'll likely merge what I'm comfortable with (regarding what we were discussing about not leaking Unset), even if I'm not sure if it gets you all of the things you were hoping for. I'm happy to keep discussing if there's still a large gap afterwards, though I really think we need a type checker plugin which allows differentiating between public and private type signatures...

But let me have another look and really appreciate your effort.

Julian added a commit that referenced this pull request Jun 5, 2024
I don't want them appearing in public documentation. The public types of
these arguments do not include Unset, a user may only pass them the
declared type, even though internally the type is larger.

Refs: #1019 (comment)
@Julian Julian merged commit ceb362a into python-jsonschema:main Jun 5, 2024
@Julian
Copy link
Member

Julian commented Jun 5, 2024

(Done)

Please feel encouraged to give more feedback (or certainly propose further work); I hope to take less than 2 years for any such future discussion :(

But yeah really appreciated, and hopefully even though it's not exactly as proposed, I hope what's merged is at least slightly better for you than previous.

@DanielNoord DanielNoord deleted the typing/base branch June 5, 2024 07:12
@dangotbanned
Copy link

@DanielNoord @Julian
I'm not sure where to open an issue for this, as the cause for vega/altair#3475 seems to be typeshed conflicting with the aims set out in this PR - to not leak Unset | _unset

Following v4.23.0 the new stubs now expose Unset https://github.com/python/typeshed/pull/12301/files

I'm very aware jsonschema did not author the stubs, but I imagine this change was not anticipated - as a result of https://github.com/python-jsonschema/jsonschema/releases/tag/v4.23.0

@Julian
Copy link
Member

Julian commented Jul 12, 2024

I actually honestly don't know either what the right specific thing to do is if typeshed did something different.

From an end-user perspective as I tried to say in this PR, it's just plain incorrect -- the public type of that attribute is str, will always be str, and it's an implementation detail that in the course of assembling the object we still do this temporary _Unset nonsense which we'd never do if it were authored today. But no one downstream using the attribute should be seeing anything but str, if they did it would be a bug in this library.

It sounds perhaps at least like we would return to the status quo if a PR to typeshed removed the change to that line and returned it to how they had it before?

dangotbanned added a commit to dangotbanned/altair that referenced this pull request Jul 12, 2024
@dangotbanned
Copy link

I actually honestly don't know either what the right specific thing to do is if typeshed did something different.

From an end-user perspective as I tried to say in this PR, it's just plain incorrect -- the public type of that attribute is str, will always be str, and it's an implementation detail that in the course of assembling the object we still do this temporary _Unset nonsense which we'd never do if it were authored today. But no one downstream using the attribute should be seeing anything but str, if they did it would be a bug in this library.

It sounds perhaps at least like we would return to the status quo if a PR to typeshed removed the change to that line and returned it to how they had it before?

Thanks for the quick response @Julian and confirming my suspicion

I agree, that would make the most sense in a PR to typeshed

@DanielNoord
Copy link
Contributor Author

As PR author I agree with patching typeshed. I think packages should strive towards typing their code "as is" and let dependencies deal with any "nonsense". This allows type checkers to actually verify the code of the package which in the long run probably provides more benefit to downstream packages than avoiding their nonsense.

However, while jsonschema is not fully typed (and therefore the type checker provides no benefit for downstream), typeshed should help users a little and only expose the types as they are at runtime.

binste pushed a commit to vega/altair that referenced this pull request Jul 13, 2024
…3475)

* fix(typing): Confirm arg type before joining in `_deduplicate_enum_errors`

Fixes `mypy` error that appeared during a merge #3467 (comment)

* revert: undo last commit

* revert: Ignore type error introduced by `typeshed`

The issue is not related to any runtime changes python-jsonschema/jsonschema#1019 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants