Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-42307][SQL] Assign name for error _LEGACY_ERROR_TEMP_2232 #47354

Closed
wants to merge 8 commits into from

Conversation

junyuc25
Copy link
Contributor

@junyuc25 junyuc25 commented Jul 15, 2024

What changes were proposed in this pull request?

In this PR, I propose to replace the legacy error name _LEGACY_ERROR_TEMP_2232 with ROW_VALUE_IS_NULL , and add a test case for it.

Why are the changes needed?

Proper name improves user experience with Spark SQL.

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

Ran all the tests in the suite:

build/sbt "testOnly *org.apache.spark.sql.RowSuite"

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Jul 15, 2024
@junyuc25 junyuc25 changed the title [WIP][Draft] Assign name for error _LEGACY_ERROR_TEMP_2232 [SPARK-42307] Assign name for error _LEGACY_ERROR_TEMP_2232 Jul 16, 2024
@junyuc25 junyuc25 marked this pull request as ready for review July 16, 2024 03:32
@junyuc25
Copy link
Contributor Author

Hi @MaxGekk , I propose a PR to update the name for legacy errors. Do you mind taking a look? Thanks.

@junyuc25 junyuc25 changed the title [SPARK-42307] Assign name for error _LEGACY_ERROR_TEMP_2232 [SPARK-42307][SQL] Assign name for error _LEGACY_ERROR_TEMP_2232 Jul 16, 2024
Copy link
Contributor

@allisonwang-db allisonwang-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -123,4 +123,17 @@ class RowSuite extends SparkFunSuite with SharedSparkSession {
parameters = Map("methodName" -> "fieldIndex", "className" -> "Row", "fieldName" -> "`foo`")
)
}

test("SPARK-42307 - Trying to get the value from a null column should result in error") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
test("SPARK-42307 - Trying to get the value from a null column should result in error") {
test("SPARK-42307: get the value from a null column should result in error") {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment @allisonwang-db. I updated the test description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad I clicked the revoke review by mistake...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @allisonwang-db @cloud-fan @MaxGekk. Not sure if you got a chance to review this small PR? Thanks!

@junyuc25 junyuc25 requested a review from allisonwang-db July 20, 2024 03:14
@@ -123,4 +123,17 @@ class RowSuite extends SparkFunSuite with SharedSparkSession {
parameters = Map("methodName" -> "fieldIndex", "className" -> "Row", "fieldName" -> "`foo`")
)
}

test("SPARK-42307 - get the value from a null column should result in error") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test("SPARK-42307 - get the value from a null column should result in error") {
test("SPARK-42307: get a value from a null column should result in error") {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @allisonwang-db! Updated.

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct? If not, please, fix it.

* @throws NullPointerException when value is null.

@@ -572,6 +572,12 @@
],
"sqlState" : "42703"
},
"COLUMN_VALUE_IS_NULL" : {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's name in more precise way:

Suggested change
"COLUMN_VALUE_IS_NULL" : {
"ROW_VALUE_IS_NULL" : {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -572,6 +572,12 @@
],
"sqlState" : "42703"
},
"COLUMN_VALUE_IS_NULL" : {
"message" : [
"Value at index <index> is null."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Value at index <index> is null."
Found NULL in a row at the index <index>, expected a non-NULL value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"message" : [
"Value at index <index> is null."
],
"sqlState" : "22000"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this number "22023"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... name _LEGACY_ERROR_TEMP_2232 with COLUMN_VALUE_IS_NULL

Could you update PR's description according to your changes, please.

@@ -219,39 +219,34 @@ trait Row extends Serializable {
* Returns the value at position i as a primitive boolean.
*
* @throws ClassCastException when data type does not match.
* @throws NullPointerException when value is null.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify why did you remove it instead of replacing NullPointerException by SparkException, just wonder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops I misunderstood your previous comment. Replaced NPE with SparkException.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And one more thing is need to update the SQL migration guide because this might impact user apps and potentially break them. Please, add an item to sql-migration-guide.md.

@junyuc25
Copy link
Contributor Author

Hi @MaxGekk , looks like the current implementation is causing an error when generating docs (job link), because SparkException is declared in the comment but it is not thrown from the codes.

[error] /__w/spark/spark/sql/api/target/java/org/apache/spark/sql/Row.java:135:1:  error: exception not thrown: org.apache.spark.SparkException
[error]    * @throws org.apache.spark.SparkException when value is null.
[error]              ^
[warn] javadoc exited with exit code 1

I found two options to fix this:

  1. Add @throws[SparkException] annotation to the method. This is equivalent to adding throws XXException to a Java method signature (i.e. public boolean getBoolean (int i) throws org.apache.spark.SparkException), and hence Java consumers of this method would need to catch this exception explicitly.
  /**
   * Returns the value at position i as a primitive boolean.
   *
   * @throws ClassCastException when data type does not match.
   * @throws org.apache.spark.SparkException when value is null.
   */
  @throws[SparkException]
  def getBoolean(i: Int): Boolean = getAnyValAs[Boolean](i)
  1. Do not add @throws org.apache.spark.SparkException to the comment, and also remove @throws NullPointerException (because NPE would not be thrown from the codes).
  /**
   * Returns the value at position i as a primitive boolean.
   *
   * @throws ClassCastException when data type does not match.
   */
  def getBoolean(i: Int): Boolean = getAnyValAs[Boolean](i)

I kind of lean towards option #1 as it could provide users with better context about the exception type being thrown, although they would have to handle this exception explicitly in their codes. Let me know your thought on this or if you have other ideas. Thanks :)!

@MaxGekk
Copy link
Member

MaxGekk commented Aug 13, 2024

@junyuc25 I guess a runtime exception might not force catching the exception. How about to replace SparkException by SparkRuntimeException in valueIsNullError()?

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM waiting for CI.

@MaxGekk
Copy link
Member

MaxGekk commented Aug 13, 2024

+1, LGTM. Merging to master.
Thank you, @junyuc25 and @allisonwang-db for review.

@MaxGekk MaxGekk closed this in f081650 Aug 13, 2024
@MaxGekk
Copy link
Member

MaxGekk commented Aug 13, 2024

@junyuc25 Do you have an account at the OSS JIRA? If so, leave a comment in SPARK-42307. I will assign the ticket to you.

@junyuc25
Copy link
Contributor Author

junyuc25 commented Aug 14, 2024 via email

IvanK-db pushed a commit to IvanK-db/spark that referenced this pull request Sep 20, 2024
### What changes were proposed in this pull request?

In this PR, I propose to replace the legacy error name `_LEGACY_ERROR_TEMP_2232` with `ROW_VALUE_IS_NULL `, and add a test case for it.

### Why are the changes needed?

Proper name improves user experience with Spark SQL.

### Does this PR introduce _any_ user-facing change?

Yes.

### How was this patch tested?

Ran all the tests in the suite:
```
build/sbt "testOnly *org.apache.spark.sql.RowSuite"
```

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#47354 from junyuc25/SPARK-42307.

Lead-authored-by: junyuc25 <10862251+junyuc25@users.noreply.github.com>
Co-authored-by: junyuc25 <=>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?

In this PR, I propose to replace the legacy error name `_LEGACY_ERROR_TEMP_2232` with `ROW_VALUE_IS_NULL `, and add a test case for it.

### Why are the changes needed?

Proper name improves user experience with Spark SQL.

### Does this PR introduce _any_ user-facing change?

Yes.

### How was this patch tested?

Ran all the tests in the suite:
```
build/sbt "testOnly *org.apache.spark.sql.RowSuite"
```

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#47354 from junyuc25/SPARK-42307.

Lead-authored-by: junyuc25 <10862251+junyuc25@users.noreply.github.com>
Co-authored-by: junyuc25 <=>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
### What changes were proposed in this pull request?

In this PR, I propose to replace the legacy error name `_LEGACY_ERROR_TEMP_2232` with `ROW_VALUE_IS_NULL `, and add a test case for it.

### Why are the changes needed?

Proper name improves user experience with Spark SQL.

### Does this PR introduce _any_ user-facing change?

Yes.

### How was this patch tested?

Ran all the tests in the suite:
```
build/sbt "testOnly *org.apache.spark.sql.RowSuite"
```

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#47354 from junyuc25/SPARK-42307.

Lead-authored-by: junyuc25 <10862251+junyuc25@users.noreply.github.com>
Co-authored-by: junyuc25 <=>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants