Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-47637][SQL] Use errorCapturingIdentifier in more places #45764

Closed

Conversation

srielau
Copy link
Contributor

@srielau srielau commented Mar 28, 2024

What changes were proposed in this pull request?

errorCapturingIdentifier parses identifier with included '-' to raise INVALID_IDENTIFIER errors
instead of SYNTAX_ERROR for non-delimited identifiers containing a hyphen.

It is meant to be used wherever the context is not that of an expression
This PR replaces a few missed identifiers with that rule.

Why are the changes needed?

Improve error messages for undelimited identifiers with a hyphen.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added unit tests in ErrorParserSuite.scala

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Mar 28, 2024
@HyukjinKwon HyukjinKwon changed the title [SPARK-47637] Use errorCapturingIdentifier in more places [SPARK-47637][SQL] Use errorCapturingIdentifier in more places Mar 29, 2024
@@ -76,7 +76,7 @@ statement
| ctes? dmlStatementNoWith #dmlStatement
| USE identifierReference #use
| USE namespace identifierReference #useNamespace
| SET CATALOG (identifier | stringLit) #setCatalog
| SET CATALOG (errorCapturingIdentifier | stringLit) #setCatalog
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we consider extending:

IDENTIFIER
    : (UNICODE_LETTER | DIGIT | '_')+
    | UNICODE_LETTER+ '://' (UNICODE_LETTER | DIGIT | '_' | '/' | '-' | '.' | '?' | '=' | '&' | '#' | '%')+

to

IDENTIFIER
    : (UNICODE_LETTER | DIGIT | '_' | '-')+
    | UNICODE_LETTER+ '://' (UNICODE_LETTER | DIGIT | '_' | '/' | '-' | '.' | '?' | '=' | '&' | '#' | '%')+

so that we can remove all errorCapturingIdentifier for simplicity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot. that would break SELECT c1 - c2 FROM VALUES(1, 2) AS T(c1, c2);

An interesting thought would be to flip the script and rename errorCapturingIdentifier to identifier.
And find another name for identifier like identifierIsExpression.
But that adds churn.

@gengliangwang
Copy link
Member

Thanks, merging to master

sweisdb pushed a commit to sweisdb/spark that referenced this pull request Apr 1, 2024
### What changes were proposed in this pull request?

errorCapturingIdentifier parses identifier with included '-' to raise INVALID_IDENTIFIER errors
instead of SYNTAX_ERROR for non-delimited identifiers containing a hyphen.

It is meant to be used wherever the context is not that of an expression
This PR replaces a few missed identifiers with that rule.

### Why are the changes needed?

Improve error messages for undelimited identifiers with a hyphen.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added unit tests in ErrorParserSuite.scala

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#45764 from srielau/SPARK-47637-errorCapturingIdentifier.

Authored-by: Serge Rielau <serge@rielau.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants