Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support alternative UUID formats. #499

Merged
merged 4 commits into from
Jul 28, 2023
Merged

Support alternative UUID formats. #499

merged 4 commits into from
Jul 28, 2023

Conversation

jcrist
Copy link
Owner

@jcrist jcrist commented Jul 28, 2023

This adds a new uuid_format to the json and msgpack encoders, configuring how msgspec will handle uuids. This may be one of:

  • canonical: UUIDs are encoded as canonical strings (same as str(uuid)). This is the default.
  • hex: UUIDs are encoded as strings without hyphens (same as uuid.hex).
  • bytes: UUIDs are encoded as binary values representing big-endian 128-bit integers (same as uuid.bytes).

All protocols support decoding from any of these formats by default. Note that only msgpack supports encoding/decoding the 'bytes' value, as JSON lacks a native binary type.

In [1]: import msgspec

In [2]: import uuid

In [3]: u = uuid.uuid4()

In [4]: u
Out[4]: UUID('827ba710-601d-421c-bdaf-632d2db2620c')

In [5]: msgspec.json.encode(u)  # defaults to canonical
Out[5]: b'"827ba710-601d-421c-bdaf-632d2db2620c"'

In [6]: hex_enc = msgspec.json.Encoder(uuid_format="hex")  # use the hex format

In [7]: hex_enc.encode(u)
Out[7]: b'"827ba710601d421cbdaf632d2db2620c"'

In [8]: msgspec.json.decode(_, type=uuid.UUID)  # decode supports all formats
Out[8]: UUID('827ba710-601d-421c-bdaf-632d2db2620c')

In [9]: bytes_enc = msgspec.msgpack.Encoder(uuid_format="bytes")  # use the bytes format

In [10]: bytes_enc.encode(u)
Out[10]: b'\xc4\x10\x82{\xa7\x10`\x1dB\x1c\xbd\xafc--\xb2b\x0c'

In [11]: u.bytes  # this ^^ is the msgpack encoded version of u.bytes
Out[11]: b'\x82{\xa7\x10`\x1dB\x1c\xbd\xafc--\xb2b\x0c'

In [12]: msgspec.msgpack.decode(bytes_enc.encode(u), type=uuid.UUID)  # roundtrips fine
Out[12]: UUID('827ba710-601d-421c-bdaf-632d2db2620c')

Fixes #493.

jcrist added 4 commits July 27, 2023 23:17
This adds support for decoding UUIDs from 16-byte binary inputs if
`strict=False`. The inputs are interpreted as big-endian 128-bit
integers. This is supported for `msgspec.convert` and
`msgspec.msgpack.decode` alone, since other protocols don't have native
binary input support.
This adds a new `uuid_format` to the json and msgpack encoders,
configuring how msgspec will handle uuids. This may be one of:
- 'canonical': encoded as canonical strings (same as `str(uuid)`).
- 'hex': encoded as strings without hyphens (same as `uuid.hex`).
- 'bytes': encoded as binary values representing big-endian 128-bit
  integers (same as `uuid.bytes`).

Note that only msgpack supports the `'bytes'` value, as JSON lacks a
native binary type. Defaults to `'canonical'`.
This seems safe to do since encoding UUIDs as 16 byte values is a well
defined format and common-enough format.
@jcrist jcrist merged commit 6bb1937 into main Jul 28, 2023
@jcrist jcrist deleted the uuid-as-bytes branch July 28, 2023 04:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support encoding UUIDs as bytes
1 participant