Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support conditional queries in query_as! #1491

Closed
wants to merge 32 commits into from

Conversation

NyxCode
Copy link

@NyxCode NyxCode commented Oct 13, 2021

@abonander
Copy link
Collaborator

If you rebase it should fix the MySQL build failure.

@NyxCode
Copy link
Author

NyxCode commented Nov 18, 2021

Is anyone willing to take a look at this and provide some feedback?

@raffomania
Copy link
Contributor

This is super exciting! I'll try it out and will report :)

@raffomania
Copy link
Contributor

I've tried to compile this but couldn't get it to work. I'm using postgres and get the same error as the postgres CI job:

error: expected `".."`, `if { .. }` or `match .. { .. }`

The error appeared even after I rebased the branch on master.

@raffomania
Copy link
Contributor

raffomania commented Jan 31, 2022

I've managed to fix the compilation error. It seems like there was a bug where arguments after the query segments would not get parsed but caused an error.

I've fixed it by only parsing query segments while they parse successfully, then parsing arguments. See this commit for the fix. It doesn't seem ideal to me, as malformed query segment syntax will probably result in an argument parsing error now, swallowing the query segment parsing error :/

Additionally, in my project the compiler complained about not finding the sqlx_macros crate. I've fixed that by updating the calls to expand_query! to sqlx::sqlx_macros::expand_query instead of just sqlx_macros::expand_query.

@NyxCode
Copy link
Author

NyxCode commented Mar 2, 2022

I've tried to compile this but couldn't get it to work. I'm using postgres and get the same error as the postgres CI job:

error: expected `".."`, `if { .. }` or `match .. { .. }`

The error appeared even after I rebased the branch on master.

Turns out this was caused by a trailing comma, for example query_as!(Article, "SELECT * FROM articles",). Fixed now.

@NyxCode NyxCode changed the title WIP: conditional_query_as! support conditional queries in query_as! Mar 9, 2022
@NyxCode
Copy link
Author

NyxCode commented Mar 9, 2022

As far as I'm concerned, this PR is ready to be merged.
I would appreciate a review.

@jayrave
Copy link

jayrave commented Mar 20, 2022

This is interesting work & something that would be useful for me in a handful of scenarios. For one of those I tried to put this to use & gathered a few preliminary stats that might be useful to be considered:

Setup

  • Using bae5dd6 commit of this PR
  • Using postgres v13.3 ( in non-offline mode)
  • Reading 26 fields from a table

I was planning to use conditional queries to allow different filtering & sorting options for this query:

  • filtering one or many fields (up to 7)
  • ordering based on one of 6 fields

This translated to such a (pseudo-)query for me:

sqlx::query_as!(
  Account,
  r#"
      SELECT
        "accounts"."id",
        "accounts"."field1",
        "accounts"."field2",
        "accounts"."field3",
        ...
      FROM
        accounts
      WHERE
        accounts.id IS NOT NULL
  "#
  if let Some(field1_filter) = Some(field1_filter) {
      "AND accounts.field1 LIKE {field1_filter}"
  }
  ...
  //
  // 6 such if statements
  //
  ...
  match order_by_key {
      FIELD1 => "ORDER BY accounts.field1",
      FIELD2 => "ORDER BY accounts.field2",
      FIELD3 => "ORDER BY accounts.field3",
      FIELD4 => "ORDER BY accounts.field4",
      FIELD5 => "ORDER BY accounts.field5",
      FIELD6 => "ORDER BY accounts.field6",
  }
)

Commented out parts of queries & expanded the generated code to get a feel for what it looks like for reading 26 fields from the DB - (branches in this stat mean the number of variants in the generated ConditionalMap enum):

  • 1 if => 2 branches, 174 LOC
  • 2 ifs => 4 branches, 350 LOC
  • 3 ifs => 8 branches, 722 LOC
  • 4 ifs => 16 branches, 1442 LOC
  • 5 ifs => 32 branches, 2864 LOC
  • 6 ifs => 64 branches, 5708 LOC
  • 7 ifs => 128 branches, 11396 LOC
  • 0 ifs & 1 match with 6 arms => 6 branches, 438 LOC
  • 2 ifs & 1 match with 6 arms => 24 branches, 2017 LOC

As it can be seen, growth is exponential because of the required nesting to cover all the required combinations. I didn't note down the time it took for this code generation (it ranged from almost instantaneous to taking a few seconds as the number of branches grew). I tried generating this with "7 ifs & 1 match with 6 arms" => this would have led to 768 branches & ~66k LOC but the compiler hadn't finished expanding even after a few minutes (note: I was writing out the expanded code to console which probably was slowing down the process even more).

One thing of note is that there was a lot of repetition across the different ConditionalMap branches (~60 LOC for reading from row & mapping to the struct) - happens since the first round of expansion calls sqlx_macros::expand_query! which then generates code for every branch separately. Perhaps, some stuff could be refactored to build the QueryAs per branch & extracting out the mapping to be done in one place (under the assumption that's possible & rust won't be able to optimize the generated code by itself).

Nevertheless, it is an interesting approach towards having compile time checked dynamic queries 👏

@NyxCode
Copy link
Author

NyxCode commented Mar 20, 2022

@jayrave Thanks for the feedback, appreciate it!
The complexity is definetely something which should be kept in mind when using this. I documented this here.
That being said, I do think there is quite a lot of headroom when it comes to optimizing the macro output. That does not change the O(2^n) complexity, but would help a bit.
Also, I will have to benchmark where the compiler actually spends it's time - I suspect most of it is checking every possible query against the database.

@@ -3,8 +3,6 @@ name: SQLx
on:
pull_request:
push:
branches:
- master
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this?

@abonander
Copy link
Collaborator

All righty, I'm finally looking at this. A few notes so far:

As for how that should work with compile-time checking, yeah I dunno. You would probably still want to check every possible combination which is still gonna have an exponential blowup, and compile times are already a source of complaints.

I know it's kinda late for this, but I'm wondering if this isn't the best approach. I would kinda prefer working on making it easier to build the conditionals into the SQL so we only have to send a single statement, which will play much better with caching as well.

Having generalized placeholders that show which variables are being bound inline would help a lot with code readability there, which I think is one big reason why people still seem hesitant to write very complex queries.

@NyxCode
Copy link
Author

NyxCode commented Mar 24, 2022

@abonander Thanks for taking a look! I have noticed that there is major interest in this feature, so I would be delighted if we could get this functionality into sqlx, in one form or an other.

I know it's kinda late for this, but I'm wondering if this isn't the best approach. I would kinda prefer working on making it easier to build the conditionals into the SQL so we only have to send a single statement, which will play much better with caching as well.

Could you expand on that a bit? Are you proposing that we transpile the rust if/match clauses to SQL? I'm definetely open to alternative approaches, though I don't see a categorically different approach yet.

I'm concerned about potential confusion with trailing conditionals and positional bind parameters, we should probably forbid positional bind parameters entirely if the user is using conditions. Since it looks like you've partially implemented #875 that's less problematic than it seems.

This is already implemented here

I would like this to be better integrated into the rest of the implementation, namely so that it works with the other variants like query!()

I agree. This does however raise some questions, for example if we could somehow allow the branches of a conditional query! call to result in different outputs. I do feel like filtering and sorting is the biggest usecase for conditional queries, which is imho done most often with query_as!, so maybe we could start off with that.
However, I'm open to suggestions on how to improve the integration of this feature into the rest of sqlx. This was a point I was really unsure about when implementing it.

Regarding compile times & the exponential blowup

I agree this is a concern, but there doesn't seem to be a way around this while still having compile-time checking. I believe that the compile-time checks are what sets this apart from a query builder like #1594, and I don't consider this to be an alternative to that. I would like to setup some benchmarks and explore possible optimizations (for example, could we check the queries in paralell?). If this is not a show-stopper, we should probably first tackle the other questions before getting to performance.

@abonander
Copy link
Collaborator

Could you expand on that a bit?

Basically option 1 from #1488. It's actually a lot more useful of a pattern than you'd think, but it does still come up short sometimes. I feel like spending more time investigating that avenue is a good idea as it lets us avoid the major drawbacks of this approach. I think the exponential blowup is unfortunately a fundamental flaw here and a very sneaky footgun that I'd like to avoid straight-up handing to the user if we can.

for example if we could somehow allow the branches of a conditional query! call to result in different outputs.

For sanity's sake I'm gonna say that this should be explicitly a non-feature. If a condition causes a column to be omitted then it should just be explicitly optional for simple cases and for complex cases the user is probably better off writing a separate query. In the same vein as your "optimize later" sentiment, let's keep the scope of the feature as small as possible.

@NyxCode
Copy link
Author

NyxCode commented Mar 24, 2022

@abonander I believe that what I outlined in the 1st point in #1488 is super limited in comparison to the conditionals.
Maybe I should have outlined a few more examples of why there really is no alternative to expanding every possible query in some cases. Here are a few examples which come to mind:

Selecting from a different tables

sqlx::query_as!(
    Article,
    "SELECT * FROM"
    match select_from {
      Articles => "articles",
      Archive => "archive",
      Deleted => "deleted"
    }
)

Ordering/Pagination

sqlx::query_as!(
    Products,
    "SELECT * FROM products"
    "ORDER BY"
    match order_by {
        Name => "name",
        Date => "date",
    }
    match direction {
        Ascending => "ASC",
        Descending => "DESC",
    }
)

There are a lot of other pretty common usecases like this, where you just cannot capture everything you want to do in a single query. For very specific usecases, you can get away with something like .. WHERE ($1::type IS NULL OR $1 = column), but that's super limiting.

We should clearly document the limitations of conditionals, and emit warnings if too many branches are used.

Also, users always have the choice to ditch compile-time checking and construct the query at runtime (which is what you'd currently do). I could imagine exposing a "query builder" macro, which accepts the same syntax, but instead returns a String to be passed to the query functions.
That would somewhat keep the symmetry between the query functions and their macro counterparts.

If you'd like, i would happily discuss this with you further on discord.

@abonander
Copy link
Collaborator

In fact, both those patterns are possible with conditionals inside the query, though it can be unwieldy (especially for MySQL which only supports ? and so there's a lot of duplicate bindings):

SELECT * FROM articles
WHERE $1 = 'articles'
UNION ALL
SELECT * FROM archive
WHERE $1 = 'archive'
UNION ALL
SELECT * FROM deleted
WHERE $1 = 'deleted'

I personally haven't come across a situation where I needed this exact pattern, however; if multiple tables contain similar enough data that it's likely to be selected in the same route, then it should be just a single table with the structuring done within the table with something like a status field that's indexed. If some rows have more data than others then that should be a simple 1:1 left join. Honestly, needing a pattern like this tells me that there's probably architectural issues with your database.

SELECT * FROM products
ORDER BY
CASE
    WHEN $1 = 'asc' AND $2 = 'name' THEN name,
END,
CASE
    WHEN $1 = 'desc' AND $2 = 'name' THEN name,
END DESC,
CASE
    WHEN $1 = 'asc' AND $2 = 'date' THEN date,
END
CASE
    WHEN $1 = 'desc' AND $2 = 'date' THEN date,
END DESC

This one's more annoying since it does come up a lot for us, and depending on the database it could require each arm of the CASE to evaluate to the same type (Postgres) or coerce the arms into a common type (strings probably) that then may not touch indexes properly (MySQL), so each condition still needs to be duplicated.

However both of these examples are still just one query to send to the database for parsing, and depending on the database the query planner should be able to introspect the constant conditions here.

The former case is one I'm not really interested in addressing, but the latter is. I'd love to workshop a macro to specifically make conditional ORDER BYs easier to write that expands into something like the above.

@TmLev
Copy link

TmLev commented Jul 13, 2022

What's the status of this pull request?

@abonander
Copy link
Collaborator

This is unlikely to be merged as still think the possibility of a combinatorial explosion of prepared statements and generated code is a footgun I'd rather not hand our users. Perhaps I'd reconsider if macros could emit lint warnings, but I think the time would be better spent investigating how to encode the conditionals in a single query.

The approach taken in this PR is also relatively straightforward to implement on top of the SQLx facade and so doesn't necessarily need to live here.

@abonander abonander closed this Jul 14, 2022
@abonander
Copy link
Collaborator

We also have a query builder API that makes it possible to do this sort of thing at runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants