New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

8371297134 Type promotion between integral and floating types #2173

Open

poodlewars wants to merge 7 commits into master from aseaton/8371297134/type-promotion

Collaborator

poodlewars commented Feb 10, 2025

To fix the bug reported in 8371297134

Write an int64 column
Append a float32 to it
Read it -> We blow up at read time

I've added logic so that when we merge descriptors we combine:

any integer + float 64 -> float 64
integer up to 16 bits + float 32 -> float 32
integer above 16 bits + float 32 -> float 64

This is because a 16 bit integer can safely fit in a 32 bit float without loss of precision, whereas a 32 bit integer cannot. We could instead always promote up to float 64 which would be more wasteful but simpler.

Separately, for query builder pipelines I added changes so that when combining a float and an integer we always promote up to float64. This misses the cases where it is actually safe to promote to float32, but is simpler and matches Pandas. This is the cause of the test change in test_sort_merge.py.

I reworked has_valid_type_promotion, introducing a new function is_valid_type_promotion_to_target instead. This is because the has_valid_type_promotion signature was dangerous - it returned an optional type that was often interpreted only as a bool (the actual type inside it was ignored). I've replace the call sites that only tested the bool with calls to the new is_valid_type_promotion_to_target that only returns a bool.

poodlewars added 5 commits

February 10, 2025 17:01


          Type promotion between integral and floating types

3f006eb

Update tests: I deliberately promote to float64 here so we don't lose precision from the int32


          Make type arithmetic for float + int projections match Pandas

ad4b88b


          Split has_valid_type_promotion up with is_valid_type_promotion_to_tar…

557f796

…get that just returns a bool

To prevent dangerous uses where only the static_cast<bool> of its return value was used, but where the type to be promoted to was not the second argument given to it.


          Fixup after rebase

064f14f


          Remove a duplicated test I added by mistake

poodlewars requested a review from vasil-pashov

February 10, 2025 18:00

poodlewars marked this pull request as ready for review

February 11, 2025 09:22

poodlewars requested review from alexowens90 and willdealtry as code owners

February 11, 2025 09:23

vasil-pashov reviewed

View reviewed changes

cpp/arcticdb/entity/type_utils.cpp Show resolved Hide resolved

cpp/arcticdb/entity/type_utils.cpp Outdated

+                      auto target_size = entity::SizeBits::UNKNOWN_SIZE_BITS;
+                      auto floating_size = is_floating_point_type(left_type) ? left_size : right_size;
+                      auto integral_size = is_floating_point_type(left_type) ? right_size : left_size;

Collaborator

vasil-pashov Feb 12, 2025

nit: I think using is_integer_type would be easier to follow.

cpp/arcticdb/entity/type_utils.cpp Outdated

                           } else {
                               // Non-numeric target type
-                              return std::nullopt;
+                              return false;
                           }
                       } else if (is_floating_point_type(source_type)) {
                           if (is_unsigned_type(target_type) || is_signed_type(target_type)) {

Collaborator

vasil-pashov Feb 12, 2025

It's not part of the change but is a bit more readable

Suggested change

      
                        if (is_unsigned_type(target_type) || is_signed_type(target_type)) {
          
                        if (is_integer_type(target_type)) {

cpp/arcticdb/entity/type_utils.cpp

                       }
-                      return target;
+                      auto target_size = entity::SizeBits(uint8_t(std::max(left_size, right_size)) + 1);

Collaborator

vasil-pashov Feb 12, 2025

I get why this works but it seems a bit hacky and fragile.

Collaborator Author

poodlewars Feb 20, 2025

This is just the old logic moved around, so I'd rather not change it

cpp/arcticdb/entity/type_utils.cpp Show resolved Hide resolved

cpp/arcticdb/entity/type_utils.cpp Outdated

+                      return (is_integer_type(left) && is_floating_point_type(right)) || (is_floating_point_type(left) && is_integer_type(right));
+                  }
+                  std::optional<entity::TypeDescriptor> common_type_float_integer(const entity::TypeDescriptor& left, const entity::TypeDescriptor& right) {

Collaborator

vasil-pashov Feb 12, 2025

nit: If used only in this file mark it as static

Suggested change

      
                std::optional<entity::TypeDescriptor> common_type_float_integer(const entity::TypeDescriptor& left, const entity::TypeDescriptor& right) {
          
                static std::optional<entity::TypeDescriptor> common_type_float_integer(const entity::TypeDescriptor& left, const entity::TypeDescriptor& right) {

cpp/arcticdb/entity/type_utils.cpp Outdated

+                      }
+                  }
+                  std::optional<entity::TypeDescriptor> common_type_mixed_sign_ints(const entity::TypeDescriptor& left, const entity::TypeDescriptor& right) {

Collaborator

vasil-pashov Feb 12, 2025

Suggested change

      
                std::optional<entity::TypeDescriptor> common_type_mixed_sign_ints(const entity::TypeDescriptor& left, const entity::TypeDescriptor& right) {
          
                static std::optional<entity::TypeDescriptor> common_type_mixed_sign_ints(const entity::TypeDescriptor& left, const entity::TypeDescriptor& right) {

python/tests/unit/arcticdb/version_store/test_column_type_changes.py Show resolved Hide resolved

poodlewars added 2 commits

February 20, 2025 11:09


          Code review comments

0c4b25b


          Add repro of a resample test that also fails on master, increase test…

04eeafb

… tolerance as a result

vasil-pashov approved these changes

View reviewed changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

vasil-pashov vasil-pashov approved these changes

alexowens90 Awaiting requested review from alexowens90 alexowens90 is a code owner

willdealtry Awaiting requested review from willdealtry willdealtry is a code owner

At least 2 approving reviews are required to merge this pull request.

Labels

None yet