-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49275][SQL] Fix return type nullness of the xpath expression #47796
Conversation
@HyukjinKwon @dongjoon-hyun Could you help review? Thanks! |
I believe this was added for Hive compat. Can we check if the behaviour is the same? |
@HyukjinKwon Unfortunately, the behavior is not the same. Running the query in Hive gives a different result:
This is a day-1 issue since the I'm not sure about the next step. We could make Spark consistent with Hive, but that would be a breaking change for Spark. |
What's the result before this change in Spark? If it was throwing an exception before, would better match with Hive's for now. |
If we can justify the difference behaviour from Hive, that works to me too. |
In However, in
But will return an empty string after the change. |
ah gotya so it's an existing behaviour okie gotya |
@chenhao-db Do the previous Spark versions suffer from the issue too like |
@MaxGekk Yes, I believe this issue has existed since |
+1, LGTM. Merging to master/3.5. |
@chenhao-db The changes cause conflicts in |
The `xpath` expression incorrectly marks its return type as array of non-null strings. However, it can actually return an array containing nulls. This can cause NPE in code generation, such as query `select coalesce(xpath(repeat('<a></a>', id), 'a')[0], '') from range(1, 2)`. It avoids potential failures in queries that uses the `xpath` expression. No. A new unit test. It would fail without the change in the PR. No. Closes apache#47796 from chenhao-db/fix_xpath_nullness. Authored-by: Chenhao Li <chenhao.li@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
### What changes were proposed in this pull request? This is a cherry-pick of #47796. The `xpath` expression incorrectly marks its return type as array of non-null strings. However, it can actually return an array containing nulls. This can cause NPE in code generation, such as query `select coalesce(xpath(repeat('<a></a>', id), 'a')[0], '') from range(1, 2)`. ### Why are the changes needed? It avoids potential failures in queries that uses the `xpath` expression. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? A new unit test. It would fail without the change in the PR. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47959 from chenhao-db/fix_xpath_nullness_3.5. Authored-by: Chenhao Li <chenhao.li@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
### What changes were proposed in this pull request? The `xpath` expression incorrectly marks its return type as array of non-null strings. However, it can actually return an array containing nulls. This can cause NPE in code generation, such as query `select coalesce(xpath(repeat('<a></a>', id), 'a')[0], '') from range(1, 2)`. ### Why are the changes needed? It avoids potential failures in queries that uses the `xpath` expression. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? A new unit test. It would fail without the change in the PR. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47796 from chenhao-db/fix_xpath_nullness. Authored-by: Chenhao Li <chenhao.li@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
### What changes were proposed in this pull request? The `xpath` expression incorrectly marks its return type as array of non-null strings. However, it can actually return an array containing nulls. This can cause NPE in code generation, such as query `select coalesce(xpath(repeat('<a></a>', id), 'a')[0], '') from range(1, 2)`. ### Why are the changes needed? It avoids potential failures in queries that uses the `xpath` expression. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? A new unit test. It would fail without the change in the PR. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47796 from chenhao-db/fix_xpath_nullness. Authored-by: Chenhao Li <chenhao.li@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
### What changes were proposed in this pull request? The `xpath` expression incorrectly marks its return type as array of non-null strings. However, it can actually return an array containing nulls. This can cause NPE in code generation, such as query `select coalesce(xpath(repeat('<a></a>', id), 'a')[0], '') from range(1, 2)`. ### Why are the changes needed? It avoids potential failures in queries that uses the `xpath` expression. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? A new unit test. It would fail without the change in the PR. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47796 from chenhao-db/fix_xpath_nullness. Authored-by: Chenhao Li <chenhao.li@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
What changes were proposed in this pull request?
The
xpath
expression incorrectly marks its return type as array of non-null strings. However, it can actually return an array containing nulls. This can cause NPE in code generation, such as queryselect coalesce(xpath(repeat('<a></a>', id), 'a')[0], '') from range(1, 2)
.Why are the changes needed?
It avoids potential failures in queries that uses the
xpath
expression.Does this PR introduce any user-facing change?
No.
How was this patch tested?
A new unit test. It would fail without the change in the PR.
Was this patch authored or co-authored using generative AI tooling?
No.