Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nullable column support #183

Open
2 tasks
JayWhite2357 opened this issue Sep 25, 2024 · 23 comments
Open
2 tasks

Add nullable column support #183

JayWhite2357 opened this issue Sep 25, 2024 · 23 comments
Labels
💎 Bounty enhancement New feature or request

Comments

@JayWhite2357
Copy link
Contributor

JayWhite2357 commented Sep 25, 2024

Background and Motivation

Currently, we have not implemented support for columns with null values. This is a very useful and ubiquitous primitive for any database engine.

Changes Required

  • Support adding a flag (or some other mechanism) to the columns types in order to support nullable columns. Logically, this is the same as adding a new column datatype for each existing datatype.
  • Support existing operations on nullable columns. (e.g. we should be able to add a nullable bigint to a non-nullable bigint)
Copy link

algora-pbc bot commented Sep 25, 2024

💎 $5,000 bounty • Space and Time

Steps to solve:

  1. Start working: (Optional) Comment /attempt #183 with your implementation plan. Note: we will only assign an issue if you include an implementation plan with a time estimate. Additionally, to be assigned an issue, you must have previously contributed to the project. You can still work on an issue and submit a PR without being assigned.
  2. Submit work: Create a pull request including /claim #183 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to spaceandtimelabs/sxt-proof-of-sql!

Add a bountyShare on socials

Attempt Started (GMT+0) Solution
🟢 @ssddOnTop Sep 25, 2024, 7:55:21 PM WIP
🟢 @TomBebb Oct 17, 2024, 6:28:37 PM WIP

@JayWhite2357
Copy link
Contributor Author

JayWhite2357 commented Sep 25, 2024

@iajoiner Can you scope out this issue a bit more and come up with a concrete plan.

Alternatively, if someone claims the bounty and is able to come up with a satisfactory plan themselves, I will tip some extra when they complete the bounty.

@ssddOnTop
Copy link

ssddOnTop commented Sep 25, 2024

/attempt #183

Algora profile Completed bounties Tech Active attempts Options
@ssddOnTop 76 bounties from 1 project
Rust, Java,
C & more
Cancel attempt

@ssddOnTop
Copy link

@JayWhite2357 I am new to the project.. can you provide some sample commands or something to test against? Something that can help me to navigate through the code easily and more importantly, understand the issue properly.

@iajoiner iajoiner added the enhancement New feature or request label Sep 25, 2024
@JayWhite2357
Copy link
Contributor Author

JayWhite2357 commented Sep 25, 2024

Take a look at the tests here: https://github.com/spaceandtimelabs/sxt-proof-of-sql/tree/main/crates/proof-of-sql/tests.
These demonstrate how to use the proof-of-sql crate, which is the top level crate.

@brainless
Copy link

brainless commented Sep 26, 2024

@JayWhite2357, @iajoiner I am not planning to attempt this but I would happy to scope this out. I will look into your linked tests.

@JayWhite2357
Copy link
Contributor Author

JayWhite2357 commented Sep 26, 2024

@brainless I can't promise a tip for just scoping it out. The bounty is specifically for completing the issue.
This is a potentially complex issue, where the correct, full design will likely not be clear until the code is complete.
To this point, @ssddOnTop (or anyone else wishing to attempt this), you may wish to get feedback from @iajoiner and myself on any design before you actually go through the effort of implementing it.

@brainless
Copy link

@JayWhite2357 sorry for the mis-communication. I did not mean a tip at all. I feel it is a great way for me to learn about the project before I want to take any bounty in the future.

@JayWhite2357
Copy link
Contributor Author

@JayWhite2357 sorry for the mis-communication. I did not mean a tip at all. I feel it is a great way for me to learn about the project before I want to take any bounty in the future.

No worries! I wasn't sure, and wanted to make sure expectations were set correctly. That sounds great!

@algora-pbc algora-pbc bot removed the 💎 Bounty label Sep 26, 2024
@ssddOnTop
Copy link

To this point, @ssddOnTop (or anyone else wishing to attempt this), you may wish to get feedback from (avoiding ping) and myself on any design before you actually go through the effort of implementing it.

Sorry for delays.. I just moved in to a new place and I don't have access to my PC, and it's very difficult to do anything on arm based mac and docker.

I'll definitely start making some progress by tomorrow.

Meanwhile @JayWhite2357 can you provide a link to the discord channel for easier communication?

@JayWhite2357
Copy link
Contributor Author

/bounty $5000

@ssddOnTop
Copy link

Is there any way to run this on cpu? @JayWhite2357

@iajoiner
Copy link
Contributor

iajoiner commented Sep 27, 2024

Here are some basic thoughts.

  • This issue can be resolved in very complex and unmaintainable manners or it can be resolved in simple ones. We strongly prefer the latter.
  • A nullable column, c, is basically two different non-nullable columns, the values column c_v and the presence column c_p which is Boolean. Using this we can significantly simplify this task.
  • Any unary operation on a nullable column c is really just the operation on c_v while the presence column of the result is simply c_p.
  • Similarly for binary operations with at least one side non-nullable we can trivially reuse results from the same binary operation on non-nullable columns.
  • For binary operations with both sides a and b nullable we are just applying the operation on a_v and b_v and at the same time apply AND on a_p and b_p. Since AND is already provable we can just reuse the result.

I will scope out the problem later today.

@JayWhite2357
Copy link
Contributor Author

Is there any way to run this on cpu? @JayWhite2357

Yes. Either set the environment variable BLITZAR_BACKEND=cpu, which still requires linux. Or, disable the "blitzar" features, which should run on any os.

@iajoiner
Copy link
Contributor

This issue should be split into multiple PRs. I ordered the tasks so that tasks below are not dependent on tasks above and the code with incremental changes should still compile and run properly with CI passing.

  • Allow nullability for columns in base/database.
    • Add NullableColumn that contains a values Column, and an optional boolean presence slice. presence being None indicates that the column is not nullable.
    • Add NullableOwnedColumn that contains a values OwnedColumn and an optional boolean presence vector. presence being None indicates that the column is not nullable.
      • It might be a good idea to rename NullableColumn to Column and the current Column to NonnullableColumn etc. Make sure to make it a breaking change.
    • Implement column operations and expression evaluation for nullable columns. Note that you should reuse the code for non-nullable ones and follow the guidelines above. Just use a match and your work will be mostly reduced to doing an AND on presence columns if both sides are nullable.
    • Extend Arrow conversions to nullable columns.
  • Allow nullability for tables in base/database.
    • Make columns in OwnedTable nullable.
    • Make sure Arrow conversions and utils still work after the change.
  • Allow nullability in sql/proof_exprs and sql/proof_plans.
    • Implement IS NULL, IS NOT NULL which is basically an EqualExpr on presence columns.
    • Implement IS TRUE which means is not false and not null which we need for WHERE clauses.
    • Extend proofs for unary and binary operations to nullable columns. Please do so with minimal changes.
      • It might be a good idea to refactor the code and make all binary operations use a generic BinaryOperationExpr and add nullability to it. Same for unary operations (just NOT for now, more to come).
    • None of the ProofPlans should require serious changes to be nullable other than accommodation of ProofExpr changes.
      • If the workload here is particularly high please split changes to ProofPlan.
  • Allow nullability in sql/postprocessing
    • Most of the work should already be complete when you do the work for columns. Do make sure test utilities etc are updated correctly.
  • Allow nullability in sql/parse.
    • Use implicit IS TRUE in WHERE clauses.
    • Get everything to work. This most likely consists of lots of miscellaneous tasks.
    • Add additional tests for tables with nullability here.
      • After this the task proof-of-sql crate is done.
  • Allow nullability in proof-of-sql-parser
    • Make sure to do this task AFTER the work for proof-of-sql is complete since otherwise it will break the proof-of-sql crate.
    • Make sql.lalrpop accept nulls.
    • Make corresponding changes in intermediate_ast.rs and elsewhere. Get the tests to run.
    • Add integration tests for nullable tables using both InnerProductProof and DoryEvaluationProof.

@ssddOnTop
Copy link

why do we have (De)serialize on ColumnType? Isn't that an internal struct?

I am confused on the flow of code

@iajoiner @JayWhite2357

@JayWhite2357
Copy link
Contributor Author

why do we have (De)serialize on ColumnType? Isn't that an internal struct?

I am confused on the flow of code

@iajoiner @JayWhite2357

Ultimately we need (De)serialize on ColumnType because we need it on DynProofPlan.

@AmitKuMaurya
Copy link

AmitKuMaurya commented Oct 11, 2024

/assign

@digital-phoenix
Copy link

@iajoiner what should the result of a binary operation be when one of the values is null?

@TomBebb
Copy link

TomBebb commented Oct 17, 2024

/attempt #183

@JayWhite2357
Copy link
Contributor Author

JayWhite2357 commented Oct 19, 2024

@iajoiner what should the result of a binary operation be when one of the values is null?

@digital-phoenix

In general, we should aim to be as similar to PostgreSQL as possible.

So, 1+NULL should be NULL. However, bool logic is not as straightforward in PostgreSQL (and most engines): https://www.postgresql.org/docs/current/functions-logical.html#FUNCTIONS-LOGICAL.

@maujim
Copy link

maujim commented Nov 4, 2024

Do you have a link to the Discord @JayWhite2357? I'd like to join and work on this bounty

@iajoiner
Copy link
Contributor

iajoiner commented Dec 2, 2024

@maujim Sorry for my late reply. Here we go: https://discord.gg/qnKtErW3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💎 Bounty enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants