-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace proof-of-sql-parser
with sqlparser
.
#235
Comments
/bounty $10000 |
💎 $10,000 bounty • Space and TimeSteps to solve:
Thank you for contributing to spaceandtimelabs/sxt-proof-of-sql! Add a bounty • Share on socials
|
/attempt #235
|
@JayWhite2357 I will connect on discord for the same |
/attempt #235 Options |
Hi @JayWhite2357 and @iajoiner,I'd like to proceed with the transition from the LALRPOP-based parser to sqlparser for the proof-of-sql-parser crate. I have gone through current implementation and working of proof-of-sql parser. Here's the approach I have in mind:
Before moving forward with this plan in my mind, I wanted to check in and ensure this approach aligns with the migration goals. If there are any specific concerns or alternative suggestions, I’d be happy to adjust the plan. I am looking forward to your feedback! |
I'm looking into it a bit, and I feel like My initial intent was for the sqlparser AST to replace the intermediate AST. This would mean that Only replacing lalrpop with sqlparser, but not replacing intermediate AST is an interesting idea that I hadn't thought of. Perhaps it makes sense as a stepping stone, but I feel like it can't be the end goal here. @iajoiner might have some feedback on this. |
Thanks @JayWhite2357 for your view on this. @iajoiner Any insights on this |
I chatted with him. He's on the same page. The goal here should be to remove the intermediate AST altogether. |
Got it! I just started with basic SELECT parsing logic to see how sqlparser works with an example. @JayWhite2357 I have joined Discord. If we have a thread in a related channel on discord, we can easily track all progress and discuss more about it in moving forward. |
@varshith257 I can chat with you on Discord. What's your handle there? |
@iajoiner I'd: vamshi_257 |
Cool! Just sent you a message there. |
/attempt #235 Options |
1 similar comment
/attempt #235 Options |
@JayWhite2357 Is @iajoiner is on holiday? I am also thinking of connecting with you on discord : Here's my ID: : vamshi_257 |
@varshith257 we're all a bit swamped at the moment. I connected with you on discord as well. |
The main effort here should be adding a function here: The function should look like this: pub fn try_new_from_sqlparser(
ast: sqlparser::ast::Query,
default_schema: Identifier,
schema_accessor: &dyn SchemaAccessor,
) -> ConversionResult<Self>; This should be able to be added without changing any existing code. Once it is in place and well tested, we can rename it to There are probably some complexities here that I'm missing. It may not be as straightforward as this. (For example, since |
@varshith257 @TomBebb @deependujha
|
A few more ideas:
|
@varshith257 @TomBebb @deependujha Let's get this done ASAP and we will be more than willing to help unblock you guys if necessary. Please feel free to shoot me a message either here or on Discord (username: |
Sure @iajoiner. Lets aim to complete this ASAP within a few weeks |
Just discussed this with @iajoiner some more. This should be able to be added without changing any existing code: specifically, the |
@JayWhite2357 @iajoiner, I have DM you on Discord about a blocker. Have a look |
@JayWhite2357 @animeshd9 @deependujha @TomBebb @varshith257
pub fn try_new(
ast: sqlparser::ast::Query,
default_schema: sqlparser::ast::Ident,
schema_accessor: &dyn SchemaAccessor,
) -> ConversionResult<Self>; There should be no construct of
|
@animeshd9 @TomBebb @varshith257 Since we need The rest of the subtasks will be added as separated issues linked back here with separate bounties. We will set up deadlines so that you guys know when each task has to be completed before the bounty period ends and we start to work on the task internally. We will usually give a heads up at least a week prior to the deadline for each subtask. |
Please be sure to look over the pull request guidelines here: https://github.com/spaceandtimelabs/sxt-proof-of-sql/blob/main/CONTRIBUTING.md#submit-pr. # Please go through the following checklist - [x] The PR title and commit messages adhere to guidelines here: https://github.com/spaceandtimelabs/sxt-proof-of-sql/blob/main/CONTRIBUTING.md. In particular `!` is used if and only if at least one breaking change has been introduced. - [x] I have run the ci check script with `source scripts/run_ci_checks.sh`. # Rationale for this change In order to allow #235 to be done in time for JOIN-related integrations we need to get `proof-of-sql-parser` -> `sqlparser` adaptions done. Large parts of the work going forward can then become more manageable. <!-- Why are you proposing this change? If this is already explained clearly in the linked issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. Example: Add `NestedLoopJoinExec`. Closes #345. Since we added `HashJoinExec` in #323 it has been possible to do provable inner joins. However performance is not satisfactory in some cases. Hence we need to fix the problem by implement `NestedLoopJoinExec` and speed up the code for `HashJoinExec`. --> # What changes are included in this PR? - add `sqlparser.rs` with adaptations <!-- There is no need to duplicate the description in the ticket here but it is sometimes worth providing a summary of the individual changes in this PR. Example: - Add `NestedLoopJoinExec`. - Speed up `HashJoinExec`. - Route joins to `NestedLoopJoinExec` if the outer input is sufficiently small. --> # Are these changes tested? Yes <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? Example: Yes. -->
…parser::ast::UnaryOp` in the proof-of-sql crate (#363) Please be sure to look over the pull request guidelines here: https://github.com/spaceandtimelabs/sxt-proof-of-sql/blob/main/CONTRIBUTING.md#submit-pr. # Please go through the following checklist - [x] The PR title and commit messages adhere to guidelines here: https://github.com/spaceandtimelabs/sxt-proof-of-sql/blob/main/CONTRIBUTING.md. In particular `!` is used if and only if at least one breaking change has been introduced. - [x] I have run the ci check script with `source scripts/run_ci_checks.sh`. # Rationale for this change This PR addresses the need to replace the `proof_of_sql_parser::intermediate_ast::UnaryOp` with the `sqlparser::ast::UnaryOp` in the `proof-of-sql` crate as part of a larger transition toward integrating the `sqlparser` . This change is a subtask of issue #235, with the main goal of streamlining the repository by switching to the `sqlparser` crate and gradually replacing intermediary constructs like `proof_of_sql_parser::intermediate_ast` with `sqlparser::ast`. <!-- Why are you proposing this change? If this is already explained clearly in the linked issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. Example: Add `NestedLoopJoinExec`. Closes #345. Since we added `HashJoinExec` in #323 it has been possible to do provable inner joins. However performance is not satisfactory in some cases. Hence we need to fix the problem by implement `NestedLoopJoinExec` and speed up the code for `HashJoinExec`. --> # What changes are included in this PR? - All instances of `proof_of_sql_parser::intermediate_ast::UnaryOp` have been replaced with `sqlparser::ast::UnaryOp` - Every usage of `UnaryOp` has been updated to maintain the original functionality, ensuring no changes to the logic or behavior. - Any unsupported `UnaryOp` variants from `sqlparser` have been appropriately handled using existing error handling mechanisms (i.e., the `Unsupported `variant in `ExpressionEvaluationError`). <!-- There is no need to duplicate the description in the ticket here but it is sometimes worth providing a summary of the individual changes in this PR. Example: - Add `NestedLoopJoinExec`. - Speed up `HashJoinExec`. - Route joins to `NestedLoopJoinExec` if the outer input is sufficiently small. --> # Are these changes tested? Yes <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? Example: Yes. --> Part of #235
…lparser::ast::BinaryOp` in the proof-of-sql crate (#362) Please be sure to look over the pull request guidelines here: https://github.com/spaceandtimelabs/sxt-proof-of-sql/blob/main/CONTRIBUTING.md#submit-pr. # Please go through the following checklist - [x] The PR title and commit messages adhere to guidelines here: https://github.com/spaceandtimelabs/sxt-proof-of-sql/blob/main/CONTRIBUTING.md. In particular `!` is used if and only if at least one breaking change has been introduced. - [x] I have run the ci check script with `source scripts/run_ci_checks.sh`. # Rationale for this change This PR addresses the need to replace the `proof_of_sql_parser::intermediate_ast::BinaryOp` with the `sqlparser::ast::BinaryOp` in the `proof-of-sql` crate as part of a larger transition toward integrating the `sqlparser` . This change is a subtask of issue #235, with the main goal of streamlining the repository by switching to the `sqlparser` crate and gradually replacing intermediary constructs like `proof_of_sql_parser::intermediate_ast` with `sqlparser::ast`. <!-- Why are you proposing this change? If this is already explained clearly in the linked issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. Example: Add `NestedLoopJoinExec`. Closes #345. Since we added `HashJoinExec` in #323 it has been possible to do provable inner joins. However performance is not satisfactory in some cases. Hence we need to fix the problem by implement `NestedLoopJoinExec` and speed up the code for `HashJoinExec`. --> # What changes are included in this PR? - All instances of `proof_of_sql_parser::intermediate_ast::BinaryOp` have been replaced with `sqlparser::ast::BinaryOp` - Every usage of `BianryOp` has been updated to maintain the original functionality, ensuring no changes to the logic or behavior. - Any unsupported `BinaryOp` variants from `sqlparser` have been appropriately handled using existing error handling mechanisms (i.e., the `Unsupported `variant in `ExpressionEvaluationError`). <!-- There is no need to duplicate the description in the ticket here but it is sometimes worth providing a summary of the individual changes in this PR. Example: - Add `NestedLoopJoinExec`. - Speed up `HashJoinExec`. - Route joins to `NestedLoopJoinExec` if the outer input is sufficiently small. --> # Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? Example: Yes. --> Yes Closes #349 Part of #235
The deadline for this ticket is Jan 5th 2025. |
…sqlparser::ast::Ident` in the proof-of-sql crate (#382) Please be sure to look over the pull request guidelines here: https://github.com/spaceandtimelabs/sxt-proof-of-sql/blob/main/CONTRIBUTING.md#submit-pr. # Please go through the following checklist - [x] The PR title and commit messages adhere to guidelines here: https://github.com/spaceandtimelabs/sxt-proof-of-sql/blob/main/CONTRIBUTING.md. In particular `!` is used if and only if at least one breaking change has been introduced. - [x] I have run the ci check script with `source scripts/run_ci_checks.sh`. # Rationale for this change This PR addresses the need to replace the `proof_of_sql_parser::Identifier` with the `sqlparser::ast::Ident` in the `proof-of-sql` crate as part of a larger transition toward integrating the `sqlparser` . This change is a subtask of issue #235, with the main goal of streamlining the repository by switching to the `sqlparser` crate and gradually replacing intermediary constructs like `proof_of_sql_parser::intermediate_ast` with `sqlparser::ast`. <!-- Why are you proposing this change? If this is already explained clearly in the linked issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. Example: Add `NestedLoopJoinExec`. Closes #345. Since we added `HashJoinExec` in #323 it has been possible to do provable inner joins. However performance is not satisfactory in some cases. Hence we need to fix the problem by implement `NestedLoopJoinExec` and speed up the code for `HashJoinExec`. --> # What changes are included in this PR? - All instances of `proof_of_sql_parser::Identifier` have been replaced with `sqlparser::ast::Ident` - A few of them required an identifier (e.g. Expression::Column, etc..), which is dependent on the Identifier and will be migrated at the refactoring of Exprs. - Every usage of `Identifier` has been updated to maintain the original functionality, ensuring no changes to the logic or behavior. - The breaking change here is that `Ident` doesn't support `Copy` trait so we have needed the clones in the places where values are moved - Deleted the test `we_cannot_convert_a_record_batch_if_it_has_repeated_column_names` because the `sqlparser` now differentiates between uppercase and lowercase identifiers. Case normalization is no longer applied and `sqlparser` treats `a` and `A` as distinct identifiers. - Examples are updated to align with `sqlparser`'s case-sensitive behavior. <!-- There is no need to duplicate the description in the ticket here but it is sometimes worth providing a summary of the individual changes in this PR. Example: - Add `NestedLoopJoinExec`. - Speed up `HashJoinExec`. - Route joins to `NestedLoopJoinExec` if the outer input is sufficiently small. --> # Are these changes tested? Yes <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? Example: Yes. --> Part of #235
Please be sure to look over the pull request guidelines here: https://github.com/spaceandtimelabs/sxt-proof-of-sql/blob/main/CONTRIBUTING.md#submit-pr. # Please go through the following checklist - [ ] The PR title and commit messages adhere to guidelines here: https://github.com/spaceandtimelabs/sxt-proof-of-sql/blob/main/CONTRIBUTING.md. In particular `!` is used if and only if at least one breaking change has been introduced. - [ ] I have run the ci check script with `source scripts/run_ci_checks.sh`. # Rationale for this change To update docs to reflect the migration of Identifier -> Ident <!-- Why are you proposing this change? If this is already explained clearly in the linked issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. Example: Add `NestedLoopJoinExec`. Closes #345. Since we added `HashJoinExec` in #323 it has been possible to do provable inner joins. However performance is not satisfactory in some cases. Hence we need to fix the problem by implement `NestedLoopJoinExec` and speed up the code for `HashJoinExec`. --> # What changes are included in this PR? - Updated docs in the proof-of-sql crate with Ident <!-- There is no need to duplicate the description in the ticket here but it is sometimes worth providing a summary of the individual changes in this PR. Example: - Add `NestedLoopJoinExec`. - Speed up `HashJoinExec`. - Route joins to `NestedLoopJoinExec` if the outer input is sufficiently small. --> # Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? Example: Yes. --> Part of #235
I was out hence let's extend to Jan 12th 2025 end of day ET. |
Background and Motivation
Currently, we have an in-house parser that is built on the
lalrpop
parser-generator. This has been good while the supported syntax has been simple. However, as the supported syntax has grown, we need a more comprehensive parser.The
sqlparser
crate is the parser used by DataFusion, which is part of the Arrow ecosystem. It is a feature-rich parser that ultimately will require less code maintenance. It isno_std
compatible, so there should be no issues integrating it.Changes Required
proof-of-sql-parser
usage within theproof-of-sql
crate withsqlparser
usage.proof-of-sql
crate.The text was updated successfully, but these errors were encountered: