Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix expr partial ord test #8908

Merged
merged 3 commits into from
Jan 22, 2024
Merged

Fix expr partial ord test #8908

merged 3 commits into from
Jan 22, 2024

Conversation

mustafasrepo
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

test_partial_ord test, tests the partial_cmp implementation of datafusion_expr::expr::Expr enum. This implementation does comparison based on the hashes, hence only guarantee of this operator is comparisons are consistent. The comparison results may not be actual comparison result. This PR changes this misleading test to actually test, its behaviour.

Current test depends on hashing algorithm and its result possibly changes when hashing algorithm (or seed) changes. Actually when ahash crate version=0.8.7 is used this test fails.

What changes are included in this PR?

This PR fixes problem above.

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jan 19, 2024
assert!(exp2 > exp1);
assert!(exp2 > exp3);
assert!(exp3 < exp2);
// Since comparisons are done using hash value of the expression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, but does that actually mean the comparison itself works not correctly sometimes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the calculation is consistent (will always be either < or >)

However, I wonder if the hash values are always consistent (like perhaps do they vary on x86_64 and M1 platforms?) Does that matter 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahash is not a stable hash algorithm and may not only change between platforms, but also minor releases. I lack context into why we are ordering based on hashes, but my initial response is this is probably incorrect, especially if it is inconsistent with the equality relation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but my initial response is this is probably incorrect, especially if it is inconsistent with the equality relation.

Why is it "inconsistent with equality" ? If two exprs are equal, they would have the same hash, so the order between hash values would be consistent (on a certain platform and release)

That is not to say we shouldn't change how ordering is done, but I just don't understand this comment

Copy link
Contributor

@tustvold tustvold Jan 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked this, but if the PartialEq implementation isn't also using this hash approach, the two would be inconsistent

Edit: as suspected it looks like PartialEq is being derived using a proc macro, and is therefore inconsistent with this implementation of PartialOrd - https://github.com/apache/arrow-datafusion/blob/main/datafusion/expr/src/expr.rs#L87. In particular PartialOrd could claim things to be equal due to a hash collision, when PartialEq would indicate they are not

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mustafasrepo and everyone

Note we saw failures of this test while verifying the 35.0.0 release: https://lists.apache.org/thread/onbs8l0w5s7693fchpyvwwgh61gf1jf8

I believe we should merge this PR as it is a test only change and it will make the test match what the code is doing

However, as has been mentioned, comparing hashes is probably not a good idea in general so I filed #8932 to track improving PartialOrd

BTW if anyone has a practical reason / example why comparing hashes is bad (e.g. bugs it could cause or other issues observable by customers) please add them to #8932 as that will help prioritize getting it done

@mustafasrepo
Copy link
Contributor Author

I believe we should merge this PR as it is a test only change and it will make the test match what the code is doing

I aggree with you in this regard. I will go ahead an merge this. It is really weird PartialOrd is implemented this way. However, we can continue discussion about how to proceed (whether to change it, to remove it, etc) with existing implementation in #8932

@mustafasrepo mustafasrepo merged commit c0a69a7 into apache:main Jan 22, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants