-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Change UUID representation to bytes and add integration tests #8267
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small comments, thanks for fixing this @HonahX
@@ -139,7 +139,7 @@ def literal(value: L) -> Literal[L]: | |||
elif isinstance(value, str): | |||
return StringLiteral(value) | |||
elif isinstance(value, UUID): | |||
return UUIDLiteral(value) | |||
return UUIDLiteral(value) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return UUIDLiteral(value) # type: ignore | |
return UUIDLiteral(value.bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion. I tried this and it turned out that the error here:
Incompatible return value type (got "UUIDLiteral", expected "Literal[UUID]") [return-value]
is due to the literal
function expecting a return value of Literal(<Type of value>)
.
Hence, we need this ignore since UUIDLiteral is now a Literal[bytes]
I chose to let the initialization of UUIDLiteral
still take UUID
as input to maintain minimal changes to initialization code in other places. I think letting UUIDLiteral.__init__
takes bytes
or Union[bytes, uuid]
is also reasonable. Please let me know if you prefer other way of initialization this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the UUIDLiteral
is not intended to be used by the user. The user would pass in:
>>> l = literal('27f6397c-65b9-4464-a2f5-f4ee81629134')
StringLiteral('27f6397c-65b9-4464-a2f5-f4ee81629134')
And then internally, when binding to the schema, it will be converted into an UUID, which is failing right now:
>>> from pyiceberg.types import UUIDType
>>> l.to(UUIDType)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/functools.py", line 914, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/Users/fokkodriesprong/Desktop/iceberg/python/pyiceberg/expressions/literals.py", line 519, in to
raise TypeError(f"Cannot convert StringLiteral into {type_var}")
TypeError: Cannot convert StringLiteral into <class 'pyiceberg.types.UUIDType'>
Which means that we need to implement this one as well.
The same is for the bytes:
>>> from uuid import UUID
>>> u = UUID('27f6397c-65b9-4464-a2f5-f4ee81629134')
>>> l = literal(u.bytes)
>>> l
BinaryLiteral(b"'\xf69|e\xb9Dd\xa2\xf5\xf4\xee\x81b\x914")
>>> l.to(UUIDType)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/functools.py", line 914, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/Users/fokkodriesprong/Desktop/iceberg/python/pyiceberg/expressions/literals.py", line 640, in to
raise TypeError(f"Cannot convert BinaryLiteral into {type_var}")
TypeError: Cannot convert BinaryLiteral into <class 'pyiceberg.types.UUIDType'>
I think we need to extend the literal constructor with many more types, such as datetime
, date
, time
, UUID
, etc. For the datetime related, I've created a separate issue: #8282 Maybe we can add UUID
to literal(..)
in this PR to avoid backward incompatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation! I added conversions from BinaryLiteral
to UUID
and from FixedLiteral
to UUID
, and related tests.
Here is a summarization of the supported conversion now:
from pyiceberg.expressions.literals import *
from pyiceberg.types import *
import uuid
test_uuid = uuid.UUID('d4ccb06d-e7de-48e6-ad05-649fe8deda50')
lu = literal(test_uuid)
ls = literal('d4ccb06d-e7de-48e6-ad05-649fe8deda50')
lbinary = literal(test_uuid.bytes)
lfixed = literal(test_uuid.bytes).to(FixedType(16))
assert lu.value == test_uuid.bytes
assert ls.to(UUIDType()) == lu
assert lbinary.to(UUIDType()) == lu
assert lfixed.to(UUIDType()) == lu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems the console conversion from string to uuid fails because to
expect a type instance:
l.to(UUIDType())
Thanks @HonahX for fixing this and also smoothing out the code 👍🏻 |
Fixes: #8248
This PR changes the value stored in UUID literal from
uuid
tobytes
and add relevant integration tests for unpartitined uuid tables