-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32755][SQL] Maintain the order of expressions in AttributeSet and ExpressionSet #29598
Conversation
ok to test |
Test build #128101 has finished for PR 29598 at commit
|
Test build #128102 has finished for PR 29598 at commit
|
Test build #128110 has finished for PR 29598 at commit
|
Test build #128145 has finished for PR 29598 at commit
|
Test build #128149 has finished for PR 29598 at commit
|
jenkins retest this please |
Test build #128153 has finished for PR 29598 at commit
|
Test build #128156 has finished for PR 29598 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
} | ||
def -(elem: Expression): ExpressionSet = { | ||
val newSet = clone() | ||
newSet.remove(elem) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this more efficient?:
ExpressionSet(baseSet.filter(_ != e. canonicalized), originals.filter(_.canonicalized != e.canonicalized))
@@ -27,6 +27,10 @@ object ExpressionSet { | |||
expressions.foreach(set.add) | |||
set | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should apply the same change in ExpressionSet
under the scala-2.13
source tree. @dbaliafroozeh can you open a followup PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan good catch, I thought I already deleted the ExpressionSet in 2.13. Note that we don't want it anymore as ExpressionSet doesn't extend Set anymore. I'll open a followup PR for that.
### What changes were proposed in this pull request? This PR is a followup on #29598 and removes the `ExpressionSet` class from the 2.13 branch. ### Why are the changes needed? `ExpressionSet` does not extend Scala `Set` anymore and this class is no longer needed in the 2.13 branch. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Passes existing tests Closes #29648 from dbaliafroozeh/RemoveExpressionSetFrom2.13Branch. Authored-by: Ali Afroozeh <ali.afroozeh@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Sorry to leave a message in a completed issue. @cloud-fan @dbaliafroozeh This patch seems to bring about some different behavior between use Scala 2.12 and Scala 2.13. I found that the number of failed cases increased with this patch of the sub-suites of For example, if we execute
The test result with out this patch is
and with this patch is
I haven't found the root cause yet. Do you have any good ideas for fix this problems? |
This should have been merged before we have If your spark fork has different golden files for |
@cloud-fan Maybe I didn't describe it clearly, now I use the master of spark-source to execute maven test with Scala 2.12
All tests passed. execute maven test with Scala 2.13
31 TESTS FAILED without this patch execute maven test , both Scala 2.12 and Scala 2.13 |
So always need to re-generate golden files with Scala 2.13? Or we need to use different golden files for different Scala verision, feels a little unreasonable... Or do you mean the additional failure cases in Scala 2.13 is caused by other unknown reasons? |
Interesting. So |
@Ngone51 need some simple fix on compilation for Scala 2.13 , |
@LuciferYang do you have a branch that contains the compilation fix? |
@cloud-fan @Ngone51 we can use #29660 |
@cloud-fan @Ngone51 I think the reason for this problem is spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala Lines 106 to 113 in e7d9a24
The in Scala 2.12 it use implementation of
in Scala 2.13 it use implementation of
From the above code we can found that in Scala 2.13 after Maybe we can use |
@cloud-fan @LuciferYang we can also try to use Java's LinkedHashSet here if there is a difference between different versions of Scala's mutable.LinkedHashSet. |
@dbaliafroozeh @LuciferYang can you open a PR that basically uses 2.12 implementation inside AttributeSet? |
@hvanhovell Ok ~ I will give a new followup pr |
@hvanhovell use |
I'd prefer |
Anything that explicitly maintains the insertion order (i.e. returns a LinkedHashSet) will do :). |
|
What changes were proposed in this pull request?
This PR changes
AttributeSet
andExpressionSet
to maintain the insertion order of the elements. More specifically, we:AttributeSet
fromHashSet
toLinkedHashSet
to maintain the insertion order.ExpressionSet
already uses a list to keep track of the expressions, however, since it is extending Scala's immutable.Set class, operations such as map and flatMap are delegated to the immutable.Set itself. This means that the result of these operations is not an instance of ExpressionSet anymore, rather it's a implementation picked up by the parent class. We also remove this inheritance fromimmutable.Set
and implement the needed methods directly. ExpressionSet has a very specific semantics and it does not make sense to extendimmutable.Set
anyway.PlanStabilitySuite
to not sort the attributes, to be able to catch changes in the order of expressions in different runs.Why are the changes needed?
Expressions identity is based on the
ExprId
which is an auto-incremented number. This means that the same query can yield a query plan with different expression ids in different runs.AttributeSet
andExpressionSet
internally use aHashSet
as the underlying data structure, and therefore cannot guarantee the a fixed order of operations in different runs. This can be problematic in cases we like to check for plan changes in different runs.Does this PR introduce any user-facing change?
No
How was this patch tested?
Passes
PlanStabilitySuite
after regenerating the golden files.