-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spark] Fix stats with tightBounds check for AddFiles with deletionVectors #3633
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
* | ||
* This is abstract to force the implementers of all operations to think about this setting. | ||
* All operations should add a comment justifying this setting. | ||
* Any operation that sets this to false should add a test in TightBoundsSuite. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to enforce the test? I feel like people might just cargo-cult the override + comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid that has to be up to good reviewing practice...
I have found examples of some tests that e.g. use reflection to check the number of fields in some class and then comment there what needs to be done if a new one is added.
I can imagine using reflection to find all subclasses of DeltaOperation, listing the ones that already define this as false and have a test, and instruct in that test that if you add a new subclass that overrides it to false then you should add a test, but I think it's a bit overkill...
…letionVectorStatsAreNotTightBounds (#3641) Post merge review followup: update and fix some comments. Co-authored-by: Julek Sompolski <Juliusz Sompolski>
Which Delta project/connector is this regarding?
Description
Check
DeletionVectorFilesHaveWideBoundsCheck
has been disabled for COMPUTE STATS because it reintroduces stats with tight bound to files with Deletion Vectors. However, there are other operations that can then copy over these AddFile actions with DVs and tight stats. These operations resulted in DELTA_ADDING_DELETION_VECTORS_WITH_TIGHT_BOUNDS_DISALLOWED error, which was a false positive.In this PR we also attempt to introduce and discuss a "framework" for checks like that as a property of DeltaOperations, with DeltaOperations declaring as a member method whether a certain property and check should be performed. This is opposed to current practice, where many places in the code feature special cases like matching against a certain DeltaOperation and doing something special; this kind of code is very decentralized, and it's easy to miss if any new place or new operation needs such central handling. If this was centralized in DeltaOperations, this could lead to better discoverability of special cases and edge cases when implementing new operations.
How was this patch tested?
Tests added.
Does this PR introduce any user-facing changes?
No.