-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34152][SQL] Make CreateViewStatement.child to be LogicalPlan's children so that it's resolved in analyze phase #31273
Conversation
@@ -2029,7 +2029,7 @@ class DataSourceV2SQLSuite | |||
test("CREATE VIEW") { | |||
val v = "testcat.ns1.ns2.v" | |||
val e = intercept[AnalysisException] { | |||
sql(s"CREATE VIEW $v AS SELECT * FROM tab1") | |||
sql(s"CREATE VIEW $v AS SELECT 1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be updated now that ResolveSessionCatalog
handles CreateViewStatement
only if its child
is resolved.
case plan if !plan.resolved => plan.expressions.flatMap(_.collect { | ||
case s @ SubqueryAlias(_, view: View) if view.isTempView => | ||
Seq(s.identifier.qualifier :+ s.identifier.name) | ||
case s: SubqueryAlias if s.getTagValue(SUBQUERY_TYPE_TAG).exists(_ == "tempView") => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit hacky, but SubqueryAlias(_, view: View)
couldn't handle all the cases. For example,
spark.range(10).createTempView("t")
sql("CREATE VIEW v AS SELECT * FROM t")
The child
is:
Project [id#16L]
+- SubqueryAlias t
+- Range (0, 10, step=1, splits=Some(2))
@cloud-fan, @viirya Any suggestion to capture the view properly? Does it make sense to add an isView
field to SubqueryAlias
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is it safe to do: case s: SubqueryAlias if catalog.isTempView(s.identifier.name)
?
I see the comment "After replacement, it is impossible to detect whether the SubqueryAlias is added/generated from a temporary view." If we can add a field to SubqueryAlias
, this should be easy to handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. We probably should use View
to wrap temp view as well, to be truly consistent with permanent views. cc @linhongliu-db
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will try that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 if it works, adding tag seems hacky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we not wrap dataframe temp view with View
? The original goal is to make SQL temp view and permanent view consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. In that case, we are left with two options:
- Add
isView: Boolean
field to theSubqueryAlias
, or - Introduce a new logical plan for handling dataframe temp view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try 2) if there is no opposition to the approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I get your point now. If we need a new logical plan, then it's better to reuse View
with some modification. +1 to make View.desc
optional and set it to None for dataframe temp view. Or we can generate column names for dataframe temp view in View.desc
, like col1, col2, ...
, to bypass the check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan / @viirya I updated to make View.desc
optional to support dataframe temp views. Could you please check again? Thanks!
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
val e = cls.getConstructor(classOf[Seq[Expression]], clsForUDAF, classOf[Int], classOf[Int]) | ||
.newInstance(input, | ||
clazz.getConstructor().newInstance().asInstanceOf[Object], Int.box(1), Int.box(1)) | ||
val e = cls.getConstructor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some UDAF related change, is it related to the view stuff here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, we need to collect temp functions by name. Currently, UDAF doesn't store a name, so I added it.
Test build #134304 has finished for PR 31273 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test starting |
Test build #134396 has finished for PR 31273 at commit
|
Kubernetes integration test status failure |
Test build #134465 has started for PR 31273 at commit |
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #134629 has finished for PR 31273 at commit
|
Test build #134683 has finished for PR 31273 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
// Inserting into a file-based temporary view is allowed. | ||
// (e.g., spark.read.parquet("path").createOrReplaceTempView("t"). | ||
// Thus, we need to look at the raw plan of a temporary view. | ||
getTempViewRawPlan(relation) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry I missed this. The relation here can be a table relation, not temp view, so unwrapRelationPlan
maybe a better name.
We can fix it in followup if there are no other comments to address.
* child should be a logical plan parsed from the `CatalogTable.viewText`, should throw an error | ||
* if the `viewText` is not defined. | ||
* A container for holding the view description(CatalogTable) and info whether the view is temporary | ||
* or not. If the view description is available, the child should be a logical plan parsed from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the view description is available
, it's always available now. should be if it's a SQL (temp) view, ...
inputPlan: LogicalPlan, | ||
expectedPlan: LogicalPlan, | ||
caseSensitive: Boolean = true): Unit = { | ||
checkAnalysisWithTransform(inputPlan, expectedPlan, caseSensitive) { plan => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we inline checkAnalysisWithTransform
here?
@@ -111,12 +111,11 @@ case class CreateViewCommand( | |||
|
|||
// When creating a permanent view, not allowed to reference temporary objects. | |||
// This should be called after `qe.assertAnalyzed()` (i.e., `child` can be resolved) | |||
verifyTemporaryObjectsNotExists(catalog, isTemporary, name, child) | |||
verifyTemporaryObjectsNotExists(catalog, isTemporary, name, analyzedPlan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since child
is already resolved, we don't need to call sparkSession.sessionState.executePlan(child)
and re-analyze it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CacheTableAsSelectExec.query
may not be analyzed? (other places seem fine)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just analyze it in CacheTableAsSelectExec
and remove analyzing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok let's fix it in followup then. We probably should also fix CacheTableAsSelect
and resolve the query
in the analyzer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan if we remove sparkSession.sessionState.executePlan(child)
and use child
directly, do you think there would be an issue similar to CacheTable
since child
will be an optimized plan, not an analyzed plan?
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #135392 has finished for PR 31273 at commit
|
thanks, merging to master! |
Test build #135406 has finished for PR 31273 at commit
|
Thanks @cloud-fan and @viirya for the review! |
@imback82 @cloud-fan @viirya after this pr, when run the UTs of
Is this expected behavior? |
|
Thanks @LuciferYang, let me take a look. |
@LuciferYang thanks for reporting! I'm fixing it in #31650 |
…'t exist ### What changes were proposed in this pull request? This PR fixes a mistake in #31273. When CREATE OR REPLACE a temp view, we need to uncache the to-be-replaced existing temp view. However, we shouldn't uncache if there is no existing temp view. This doesn't cause real issues because the uncache action is failure-safe. But it produces a lot of warning messages. ### Why are the changes needed? Avoid unnecessary warning logs. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? manually run tests and check the warning messages. Closes #31650 from cloud-fan/warnning. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
} else { | ||
aliasedPlan | ||
TemporaryViewRelation( | ||
prepareTemporaryViewFromDataFrame(name, aliasedPlan), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a mistake: in the if
branch, we call prepareTemporaryView(viewIdent, ...
, and here we should also use viewIdent
instead of name
. Global temp view has 2-part name and the name
may not be qualified (e.g. df.createGlobalTempView
).
@imback82 can you help to fix it with a test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #31783. Thanks!
… correctly stored ### What changes were proposed in this pull request? This PR proposed to fix a bug introduced in #31273 (https://github.com/apache/spark/pull/31273/files#r589494855). ### Why are the changes needed? This fixes a bug where global temp view's database name was not passed correctly. ### Does this PR introduce _any_ user-facing change? Yes, now the global temp view's database is correctly stored. ### How was this patch tested? Added a new test that catches the bug. Closes #31783 from imback82/SPARK-34152-bug-fix. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…alysis phase, and AlterViewAs should invalidate the cache ### What changes were proposed in this pull request? This PR proposes the following: * `AlterViewAs.query` is currently analyzed in the physical operator `AlterViewAsCommand`, but it should be analyzed during the analysis phase. * When `spark.sql.legacy.storeAnalyzedPlanForView` is set to true, store `TermporaryViewRelation` which wraps the analyzed plan, similar to #31273. * Try to uncache the view you are altering. ### Why are the changes needed? Analyzing a plan should be done in the analysis phase if possible. Not uncaching the view (existing behavior) seems like a bug since the cache may not be used again. ### Does this PR introduce _any_ user-facing change? Yes, now the view can be uncached if it's already cached. ### How was this patch tested? Added new tests around uncaching. The existing tests such as `SQLViewSuite` should cover the analysis changes. Closes #31652 from imback82/alter_view_child. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…d take/return more concrete types ### What changes were proposed in this pull request? Now that all the temporary views are wrapped with `TemporaryViewRelation`(#31273, #31652, and #31825), this PR proposes to update `SessionCatalog`'s APIs for temporary views to take or return more concrete types. APIs that will take `TemporaryViewRelation` instead of `LogicalPlan`: ``` createTempView, createGlobalTempView, alterTempViewDefinition ``` APIs that will return `TemporaryViewRelation` instead of `LogicalPlan`: ``` getRawTempView, getRawGlobalTempView ``` APIs that will return `View` instead of `LogicalPlan`: ``` getTempView, getGlobalTempView, lookupTempView ``` ### Why are the changes needed? Internal refactoring to work with more concrete types. ### Does this PR introduce _any_ user-facing change? No, this is internal refactoring. ### How was this patch tested? Updated existing tests affected by the refactoring. Closes #31906 from imback82/use_temporary_view_relation. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This PR proposes to make
CreateViewStatement.child
to beLogicalPlan
'schildren
so that it's resolved in the analyze phase.Why are the changes needed?
Currently, the
CreateViewStatement.child
is resolved when the create view command runs, which is inconsistent with other plan resolutions. For example, you may see the following in the physical plan:Does this PR introduce any user-facing change?
Yes. For the example, you will now see the resolved plan:
How was this patch tested?
Updated existing tests.