Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Pydantic v2 #163

Closed
wants to merge 6 commits into from

Conversation

ljodal
Copy link
Contributor

@ljodal ljodal commented Jul 17, 2023

This upgrades the library to use Pydantic v2. As currently implemented this is very much a breaking change. It also requires Pydantic v2 now and I've not looked into the possibility of supporting both v1 and v2.

The gist of the changes are:

  1. Add a Base64URLBytes type and use it for all bytes fields in WebAuthnBaseModel subclasses. This is similar to - and based on - Pydantic v2's Base64 type, but handles the URL safe variant of base64 and missing padding. Base64 encoding is only done when in JSON mode, when serializing to/validating from Python nothing is done to the value
  2. Update the validator on WebAuthnBaseModel to the new @field_validator interface
  3. Replace usage of deprecated pydantic APIs related to JSON encoding/decoding of models
  4. Update some dependencies, including pydantic
  5. The json_loads_base64url_to_bytes helper has been removed, as it's no longer needed

Note that I've not updated docs/readme etc yet. I'd like some input on this approach before I head down that road

@ljodal ljodal mentioned this pull request Jul 17, 2023

@validator("*", pre=True, allow_reuse=True)
def _validate_bytes_fields(cls, v, field: ModelField):
@field_validator("*", mode="before")
Copy link
Contributor Author

@ljodal ljodal Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also possible to handle base64 decoding of bytes fields here, but I found no way to use the same "*" pattern for @field_serializer so I wasn't able to make that work both ways. So because of that I decided to use a separate type instead.

It might be possible to do it in a custom @model_serizlier, but that seems a bit dirty to me. Open to different suggestions though :)

Comment on lines +37 to +39
# Specify the JSON schema of the field, which is a string with the base64
# format
WithJsonSchema({"type": "string", "format": "base64"}),
Copy link
Contributor Author

@ljodal ljodal Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One benefit of having this be a separate type like this is that you also easily get correct types in the JSON schema :)

@ljodal
Copy link
Contributor Author

ljodal commented Jul 31, 2023

@MasterKale Any chance you could take a look at this?

@MasterKale
Copy link
Collaborator

Hello @ljodal, thanks for the bump. Rest assured I haven't forgotten about this.

I'm a little hesitant to jump wholeheartedly into v2 because if something goes wrong I'm not sure I'd have time to fix the library. I haven't had as many opportunities to work on the Pydantic v2 upgrade as I'd like...

Help me remember, what would upgrading Pydantic to v2 in here mean for projects consuming py_webauthn and potentially still on Pydantic v1? Is pip/Python smart enough to handle installing v1 and v2? I don't usually have to know this as a user of libraries so I don't have great perspective as a library maintainer on potential fallout from upgrading library dependencies.

@ljodal
Copy link
Contributor Author

ljodal commented Aug 1, 2023

You cannot have multiple versions installed, so it's either v1 or v2. So this is currently blocking us from upgrading to v2.

I think maybe v2 ships with the entirety if v1 in a submodule though, so it might be possible to make no changes here and maybe be compatible with both versions by just changing some imports around. For anyone using the library that means they're bound to use the v1 module thoigh, regardless of wether they have v1 or v2 installed, they're just not compatible

@ljodal
Copy link
Contributor Author

ljodal commented Aug 1, 2023

There's a short section about it in the migration guide: https://docs.pydantic.dev/2.0/migration/#continue-using-pydantic-v1-features

@MasterKale
Copy link
Collaborator

I'm curious what you might think about #164 as an interim step, before forcing all users of py_webauthn to upgrade to Pydantic v2 🤔

Another thought I had is refactoring out Pydantic use. This library would lose runtime type checks, and I'd have to hand-roll logic to go to/from JSON, but it'd also reduce the dependencies count and as a dependency minimalist that's a Good Thing for me.

@jwag956
Copy link

jwag956 commented Aug 1, 2023

Or go back to using attrs which provides for strong typing and self-documentation, without the runtime checks.

@martynsmith
Copy link

@MasterKale Yeah, it'd be nice to just not have the dependency at all. If you're pressed for time I could probably throw together a version that just removes Pydantic entirely (mostly because I really wanna install v2 in my project but this is currently holding me back) :-)

@martynsmith
Copy link

Actually, reading through a bit more of the code, pydantic does seem to be a reasonable option (you'd have a lot more code without it). The argument of people being forced to use v1 though is interesting, they'd only be forced to use v1 interacting with this module (which they probably wouldn't even notice given they're just using the classes you've already made) and they'd be free to use v2 throughout the rest of their project?

@ljodal
Copy link
Contributor Author

ljodal commented Aug 2, 2023

I'm curious what you might think about #164 as an interim step, before forcing all users of py_webauthn to upgrade to Pydantic v2 🤔

Seems like an okay interim approach, but with it's limitations. I left a comment on the PR with some further thoughs

Another thought I had is refactoring out Pydantic use. This library would lose runtime type checks, and I'd have to hand-roll logic to go to/from JSON, but it'd also reduce the dependencies count and as a dependency minimalist that's a Good Thing for me.

I don't have any strong opinions here, but I think that'd be an ever larger breaking change than the pydantic upgrade? For example we have a WebAuthnBaseModel subclass in our project, how would that work without pydantic, would you want to reimplement everything or just limit yourself to just the models defined in the library?

Or go back to using attrs which provides for strong typing and self-documentation, without the runtime checks.

Pydantic does a lot more than attrs, so I'm not sure if that'd be a good approach here? You actually want the runtime checks when parsing data from clients and pydantic (especially v2) does that in a nice and fast way

@ljodal
Copy link
Contributor Author

ljodal commented Aug 2, 2023

#164 would cause some additional headaches for us upgrading our project to Pydantic v2, as mentioned in the PR, but at least it would unblock the dependency itself and I could try to solve the new problems.

Could another option be to release this (or some other actual upgrade to v2 variant) as an alpha? That would allow users to upgrade, but with the caveat that it's not "officially" supported and to expect potentially breaking changes.

@jwag956
Copy link

jwag956 commented Aug 2, 2023

If pydantic is truly as globally used as it says - then there will be either a huge pushback or quick acceptance. Possibly the easiest thing to do is for a short time support 2 versions of py_webauthn - with different major release numbers - there aren't that many changes happening. That keeps code simple with less possibility of introducing bugs in one or the other. If the 1000's of other packages that use pydantic move quickly - then you can officially stop supporting the V1 version.
Taking the effort to have a single package support both doesn't seem to buy much since from what I have read - you can't actually have both versions installed at the same time anyway. Packaging metadata can make sure that pip installs the correct version based on what the application has requested.

@MasterKale
Copy link
Collaborator

Possibly the easiest thing to do is for a short time support 2 versions of py_webauthn - with different major release numbers

"py_webauthn that uses Pydantic v2" would definitely involve a major version bump. webauthn==1.9.0 would be the last version to support Pydantic v1 (because it already essentially pins to the last v1 release of Pydantic) while webauthn==2.0.0 would include changes like the ones proposed in this diff. At least that's how I've seen this all going down.

What's still up in the air with me is if there's an attempt at a "1.9.1" version that tries to survive being usable in projects that could be using either v1 or v2 while finalizing the plan to get this library onto v2.

As I type it out I'm not liking the complexity of supporting such a "1.9.1" release. I think this project should have the typical 1.9.0 => 2.0.0 release, with the major version change indicating breaking API changes (in so far as we can't mask most of them with refactors to the various helper methods I emphasize use of in the example.)

🤔

@martynsmith
Copy link

martynsmith commented Aug 2, 2023

Yeah, I still support the idea of webauthn supporting both versions of pydantic. I'm definitely in a situation right now where I'd really like to update pydantic to v2 in my own project but I'm restricted because it'll break webauthn (even if it's just a temporary situation).

@@ -19,25 +18,27 @@ class WebAuthnBaseModel(BaseModel):
- Encodes bytes to Base64URL
- Converts snake_case properties to camelCase

`Model.parse_raw()` (from JSON):
`Model.model_validate_json()` (from JSON):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the reference to modelInstance.json() above need to get updated too to mention model_dump_json() instead?

return value


Base64URLBytes = Annotated[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base64URLBytes seems confusing to me because at first glance "is it string bytes? Or maybe just a byte array?"

The fact that this gets encoded and decoded between base64url and bytes feels lost in the name. I keep trying to think of a better name for this, though, and don't have any suggestions yet. 🤔

return urlsafe_b64decode(value + b"====")
except ValueError as e:
raise PydanticCustomError(
"base64_decode", "Base64 decoding error: '{error}'", {"error": str(e)}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to try and remove as much ambiguity as to whether base64 or base64url is in play. I specifically chose base64url for everything because the spec declares it as a dependency:

https://www.w3.org/TR/webauthn-2/#sctn-dependencies

Thus in places like here I want care to be taken to ensure that we're using base64url through and through.

How about something like this instead:

Suggested change
"base64_decode", "Base64 decoding error: '{error}'", {"error": str(e)}
"base64url_decode", "Base64URL decoding error: '{error}'", {"error": str(e)}

Comment on lines +31 to +32
# Parse input strings as base64url encoded values. This assumes that bytes
# values have already been base64 deocded.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Parse input strings as base64url encoded values. This assumes that bytes
# values have already been base64 deocded.
# Parse input strings as base64url encoded values. This assumes that bytes
# values have already been base64url-decoded.

Comment on lines +34 to +35
# When serializing to JSON, base64 encode the value. In Python mode we
# don't do anything
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# When serializing to JSON, base64 encode the value. In Python mode we
# don't do anything
# When serializing to JSON, base64url-encode the value. In Python mode we
# don't do anything

In Python mode we don't do anything

I'm not sure what you mean by this, can you please help me understand?

Comment on lines +37 to +38
# Specify the JSON schema of the field, which is a string with the base64
# format
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Specify the JSON schema of the field, which is a string with the base64
# format
# Specify the JSON schema of the field, which is a base64url-encoded string

@MasterKale
Copy link
Collaborator

I opted for PR #166 to add intermediate Pydantic V2 support so this isn't needed anymore.

@MasterKale MasterKale closed this Aug 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants