-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-47307][SQL][FOLLOWUP] Promote spark.sql.legacy.chunkBase64String.enabled from a legacy/internal config to a regular/public one #47410
Conversation
…legacy.chunkBase64String.enabled` to true
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
cc @gatorsmile too |
|
||
// check if unbase64 works well for chunked and non-chunked encoded strings | ||
checkEvaluation(StringDecode(UnBase64(Literal(encoded)), Literal("utf-8")), longString) | ||
checkEvaluation(StringDecode(UnBase64(Literal(chunkEncoded)), Literal("utf-8")), longString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this case to confirm that the chunkEncoded string can be decoded correctly. cc @cloud-fan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @gatorsmile and @cloud-fan
BTW, @wForget , if you change the configuration name in this PR, the PR title and description should be updated accordingly. Currently, the PR title and description is not matched. |
spark.sql.legacy.chunkBase64String.enabled
to true
Updated, thank you. |
Let's wait for @gatorsmile and @cloud-fan 's review. |
The test golden files need to be regenerated, I am doing it. |
Oh, this one?
|
yeah |
Merged to master. Hi @wForget, please help backport this to 3.5. Thank you |
…ng.enabled from a legacy/internal config to a regular/public one ### What changes were proposed in this pull request? + Promote spark.sql.legacy.chunkBase64String.enabled from a legacy/internal config to a regular/public one. + Add test cases for unbase64 ### Why are the changes needed? Keep the same behavior as before. More details: apache#47303 (comment) ### Does this PR introduce _any_ user-facing change? yes, revert behavior change introduced in apache#47303 ### How was this patch tested? existing unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47410 from wForget/SPARK-47307_followup. Lead-authored-by: wforget <643348094@qq.com> Co-authored-by: Kent Yao <yao@apache.org> Signed-off-by: Kent Yao <yao@apache.org> (cherry picked from commit af5eb08)
It's good to keep the old behavior by default, but I'd like to understand the difference. Which one is more commonly supported in the ecosystem (decoding)? The chunked one or the unchunked one? |
Unchunking is the behavior before 3.2(inclusive). Chunking was the behavior introduced accidentally in 3.3. I suggested @wForget in #47303 to make chunking as legacy behavior and unchunking as default, considering that unchunking is more reasonable here and users may not upgrade so eagerly, while chunking is mostly for editors who can not handle long lines. For the unbase64 part, both are valid for decoding. |
Hive, Trino, and Clickhouse all seem to be unchunked. |
…4String.enabled from a legacy/internal config to a regular/public one Backports #47410 to 3.5 ### What changes were proposed in this pull request? + Promote spark.sql.legacy.chunkBase64String.enabled from a legacy/internal config to a regular/public one. + Add test cases for unbase64 ### Why are the changes needed? Keep the same behavior as before. More details: #47303 (comment) ### Does this PR introduce _any_ user-facing change? yes, revert behavior change introduced in #47303 ### How was this patch tested? existing unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes #47416 from wForget/SPARK-47307_followup_3.5. Authored-by: wforget <643348094@qq.com> Signed-off-by: Kent Yao <yao@apache.org>
…ng.enabled from a legacy/internal config to a regular/public one ### What changes were proposed in this pull request? + Promote spark.sql.legacy.chunkBase64String.enabled from a legacy/internal config to a regular/public one. + Add test cases for unbase64 ### Why are the changes needed? Keep the same behavior as before. More details: apache#47303 (comment) ### Does this PR introduce _any_ user-facing change? yes, revert behavior change introduced in apache#47303 ### How was this patch tested? existing unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47410 from wForget/SPARK-47307_followup. Lead-authored-by: wforget <643348094@qq.com> Co-authored-by: Kent Yao <yao@apache.org> Signed-off-by: Kent Yao <yao@apache.org>
…ng.enabled from a legacy/internal config to a regular/public one ### What changes were proposed in this pull request? + Promote spark.sql.legacy.chunkBase64String.enabled from a legacy/internal config to a regular/public one. + Add test cases for unbase64 ### Why are the changes needed? Keep the same behavior as before. More details: apache#47303 (comment) ### Does this PR introduce _any_ user-facing change? yes, revert behavior change introduced in apache#47303 ### How was this patch tested? existing unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47410 from wForget/SPARK-47307_followup. Lead-authored-by: wforget <643348094@qq.com> Co-authored-by: Kent Yao <yao@apache.org> Signed-off-by: Kent Yao <yao@apache.org>
…ng.enabled from a legacy/internal config to a regular/public one ### What changes were proposed in this pull request? + Promote spark.sql.legacy.chunkBase64String.enabled from a legacy/internal config to a regular/public one. + Add test cases for unbase64 ### Why are the changes needed? Keep the same behavior as before. More details: apache#47303 (comment) ### Does this PR introduce _any_ user-facing change? yes, revert behavior change introduced in apache#47303 ### How was this patch tested? existing unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47410 from wForget/SPARK-47307_followup. Lead-authored-by: wforget <643348094@qq.com> Co-authored-by: Kent Yao <yao@apache.org> Signed-off-by: Kent Yao <yao@apache.org>
What changes were proposed in this pull request?
Why are the changes needed?
Keep the same behavior as before. More details: #47303 (comment)
Does this PR introduce any user-facing change?
yes, revert behavior change introduced in #47303
How was this patch tested?
existing unit test
Was this patch authored or co-authored using generative AI tooling?
No