Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8371297134 Type promotion between integral and floating types #2173

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

poodlewars
Copy link
Collaborator

To fix the bug reported in 8371297134

  • Write an int64 column
  • Append a float32 to it
  • Read it -> We blow up at read time

I've added logic so that when we merge descriptors we combine:

any integer + float 64 -> float 64
integer up to 16 bits + float 32 -> float 32
integer above 16 bits + float 32 -> float 64

This is because a 16 bit integer can safely fit in a 32 bit float without loss of precision, whereas a 32 bit integer cannot. We could instead always promote up to float 64 which would be more wasteful but simpler.

Separately, for query builder pipelines I added changes so that when combining a float and an integer we always promote up to float64. This misses the cases where it is actually safe to promote to float32, but is simpler and matches Pandas. This is the cause of the test change in test_sort_merge.py.

I reworked has_valid_type_promotion, introducing a new function is_valid_type_promotion_to_target instead. This is because the has_valid_type_promotion signature was dangerous - it returned an optional type that was often interpreted only as a bool (the actual type inside it was ignored). I've replace the call sites that only tested the bool with calls to the new is_valid_type_promotion_to_target that only returns a bool.

Update tests: I deliberately promote to float64 here so we don't lose precision from the int32
…get that just returns a bool

To prevent dangerous uses where only the static_cast<bool> of its return value was used, but where the type to be promoted to was not the second argument given to it.
@poodlewars poodlewars marked this pull request as ready for review February 11, 2025 09:22

auto target_size = entity::SizeBits::UNKNOWN_SIZE_BITS;
auto floating_size = is_floating_point_type(left_type) ? left_size : right_size;
auto integral_size = is_floating_point_type(left_type) ? right_size : left_size;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think using is_integer_type would be easier to follow.

} else {
// Non-numeric target type
return std::nullopt;
return false;
}
} else if (is_floating_point_type(source_type)) {
if (is_unsigned_type(target_type) || is_signed_type(target_type)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not part of the change but is a bit more readable

Suggested change
if (is_unsigned_type(target_type) || is_signed_type(target_type)) {
if (is_integer_type(target_type)) {

}

return target;
auto target_size = entity::SizeBits(uint8_t(std::max(left_size, right_size)) + 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get why this works but it seems a bit hacky and fragile.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the old logic moved around, so I'd rather not change it

return (is_integer_type(left) && is_floating_point_type(right)) || (is_floating_point_type(left) && is_integer_type(right));
}

std::optional<entity::TypeDescriptor> common_type_float_integer(const entity::TypeDescriptor& left, const entity::TypeDescriptor& right) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: If used only in this file mark it as static

Suggested change
std::optional<entity::TypeDescriptor> common_type_float_integer(const entity::TypeDescriptor& left, const entity::TypeDescriptor& right) {
static std::optional<entity::TypeDescriptor> common_type_float_integer(const entity::TypeDescriptor& left, const entity::TypeDescriptor& right) {

}
}

std::optional<entity::TypeDescriptor> common_type_mixed_sign_ints(const entity::TypeDescriptor& left, const entity::TypeDescriptor& right) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::optional<entity::TypeDescriptor> common_type_mixed_sign_ints(const entity::TypeDescriptor& left, const entity::TypeDescriptor& right) {
static std::optional<entity::TypeDescriptor> common_type_mixed_sign_ints(const entity::TypeDescriptor& left, const entity::TypeDescriptor& right) {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants