-
Notifications
You must be signed in to change notification settings - Fork 574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Dialect
level precedence, update Postgres Dialect
to match Postgres
#1360
Conversation
The above questions stand, but I think this is in a state where it could at least be reviewed. @alamb could you kick off tests. I'm going to use it in Logfire via a git ref now. |
@@ -300,13 +304,172 @@ pub trait Dialect: Debug + Any { | |||
// return None to fall back to the default behavior | |||
None | |||
} | |||
|
|||
/// Get the precedence of the next token. This "full" method means all precedence logic and remain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is moved to Dialect
so dialects can change the precedence without all the numeric constants being shared between modules.
it's implemented as a separate function from get_next_precedence
:
- To avoid breaking the API
- To allow dialects to customise precedence but optionally fall back to the default behaviour, as demonstrated
Snowflake
.
|
||
/// The following precedence values are used directly by `Parse` or in dialects, | ||
/// so have to be made public by the dialect. | ||
fn prec_double_colon(&self) -> u8 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are the precedence values that either Parser
needs to know about, or are used by implementations of the trait, e.g. prec_double_colon
is used by Snowflake
.
} | ||
|
||
// Define the lexical Precedence of operators. | ||
// | ||
// Uses (APPROXIMATELY) <https://www.postgresql.org/docs/7.0/operators.htm#AEN2026> as a reference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement really isn't true, hence I added APPROXIMATELY
.
We could rewrite to "was originally inspired by" or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine
// there's no XOR operator in PostgreSQL, but support it here to avoid breaking tests | ||
const XOR_PREC: u8 = 75; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what to do about this, if we remove XOR
logic from Postgres, one tests fails, but I guess people might well be using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think we should just leave it in unless there is a compelling reason to take it out
I started the tests @samuelcolvin I haven't had a chance to review this PR yet, but the basic idea of chaning to follow postgres more closely, but giving people the ability to use the old behavior if they want (via a custom Dialect trait), makes sense to me. Maybe @iffyio @jmhain or @lovasoa have an opinion they would like to share |
Pull Request Test Coverage Report for Build 10205949279Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
Currently it's the other way around: default behaviour is unchanged, |
Yes, sorry I meant that the default behavior of |
Hi all, I'd love to get this merged - having better compatibility with postgreSQL is pretty important to use at Pydantic Logfire. We've been trying this for a few days on Logfire via a branch based of the released version of |
Thanks for the ping @samuelcolvin -- reviewing now |
Dialect
level precedence, update Postgres Dialect
to match Postgres
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @samuelcolvin -- I think this looks great to me. I have some ideas on how to improve the documentation, but will make those in a follow on PR
I also updated the PR description to try and highlight the movement of precedence to the Dialect
} | ||
|
||
// Define the lexical Precedence of operators. | ||
// | ||
// Uses (APPROXIMATELY) <https://www.postgresql.org/docs/7.0/operators.htm#AEN2026> as a reference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine
// there's no XOR operator in PostgreSQL, but support it here to avoid breaking tests | ||
const XOR_PREC: u8 = 75; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think we should just leave it in unless there is a compelling reason to take it out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the late review
projection: vec![SelectItem::UnnamedExpr(Expr::BinaryOp { | ||
left: Box::new(Expr::BinaryOp { | ||
left: Box::new(Expr::Identifier(Ident { | ||
value: "foo".to_string(), | ||
quote_style: None, | ||
})), | ||
op: arrow_operator, | ||
right: Box::new(Expr::Value(Value::SingleQuotedString("bar".to_string()))), | ||
}), | ||
op: BinaryOperator::Eq, | ||
right: Box::new(Expr::Value(Value::SingleQuotedString("spam".to_string()))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could avoid the verbosity by using verified_expr
instead of verified_stmt
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a great idea -- I made a PR with this change: #1367
fn get_next_precedence(&self, parser: &Parser) -> Option<Result<u8, ParserError>> { | ||
let token = parser.peek_token(); | ||
debug!("get_next_precedence() {:?}", token); | ||
|
||
let precedence = match token.token { | ||
Token::Word(w) if w.keyword == Keyword::OR => OR_PREC, | ||
Token::Word(w) if w.keyword == Keyword::XOR => XOR_PREC, | ||
Token::Word(w) if w.keyword == Keyword::AND => AND_PREC, | ||
Token::Word(w) if w.keyword == Keyword::AT => { | ||
match ( | ||
parser.peek_nth_token(1).token, | ||
parser.peek_nth_token(2).token, | ||
) { | ||
(Token::Word(w), Token::Word(w2)) | ||
if w.keyword == Keyword::TIME && w2.keyword == Keyword::ZONE => | ||
{ | ||
AT_TZ_PREC | ||
} | ||
_ => self.prec_unknown(), | ||
} | ||
} | ||
|
||
Token::Word(w) if w.keyword == Keyword::NOT => match parser.peek_nth_token(1).token { | ||
// The precedence of NOT varies depending on keyword that | ||
// follows it. If it is followed by IN, BETWEEN, or LIKE, | ||
// it takes on the precedence of those tokens. Otherwise, it | ||
// is not an infix operator, and therefore has zero | ||
// precedence. | ||
Token::Word(w) if w.keyword == Keyword::IN => BETWEEN_LIKE_PREC, | ||
Token::Word(w) if w.keyword == Keyword::BETWEEN => BETWEEN_LIKE_PREC, | ||
Token::Word(w) if w.keyword == Keyword::LIKE => BETWEEN_LIKE_PREC, | ||
Token::Word(w) if w.keyword == Keyword::ILIKE => BETWEEN_LIKE_PREC, | ||
Token::Word(w) if w.keyword == Keyword::RLIKE => BETWEEN_LIKE_PREC, | ||
Token::Word(w) if w.keyword == Keyword::REGEXP => BETWEEN_LIKE_PREC, | ||
Token::Word(w) if w.keyword == Keyword::SIMILAR => BETWEEN_LIKE_PREC, | ||
_ => self.prec_unknown(), | ||
}, | ||
Token::Word(w) if w.keyword == Keyword::IS => IS_PREC, | ||
Token::Word(w) if w.keyword == Keyword::IN => BETWEEN_LIKE_PREC, | ||
Token::Word(w) if w.keyword == Keyword::BETWEEN => BETWEEN_LIKE_PREC, | ||
Token::Word(w) if w.keyword == Keyword::LIKE => BETWEEN_LIKE_PREC, | ||
Token::Word(w) if w.keyword == Keyword::ILIKE => BETWEEN_LIKE_PREC, | ||
Token::Word(w) if w.keyword == Keyword::RLIKE => BETWEEN_LIKE_PREC, | ||
Token::Word(w) if w.keyword == Keyword::REGEXP => BETWEEN_LIKE_PREC, | ||
Token::Word(w) if w.keyword == Keyword::SIMILAR => BETWEEN_LIKE_PREC, | ||
Token::Word(w) if w.keyword == Keyword::OPERATOR => BETWEEN_LIKE_PREC, | ||
Token::Word(w) if w.keyword == Keyword::DIV => MUL_DIV_MOD_OP_PREC, | ||
Token::Word(w) if w.keyword == Keyword::COLLATE => COLLATE_PREC, | ||
Token::Eq | ||
| Token::Lt | ||
| Token::LtEq | ||
| Token::Neq | ||
| Token::Gt | ||
| Token::GtEq | ||
| Token::DoubleEq | ||
| Token::Tilde | ||
| Token::TildeAsterisk | ||
| Token::ExclamationMarkTilde | ||
| Token::ExclamationMarkTildeAsterisk | ||
| Token::DoubleTilde | ||
| Token::DoubleTildeAsterisk | ||
| Token::ExclamationMarkDoubleTilde | ||
| Token::ExclamationMarkDoubleTildeAsterisk | ||
| Token::Spaceship => EQ_PREC, | ||
Token::Caret => CARET_PREC, | ||
Token::Plus | Token::Minus => PLUS_MINUS_PREC, | ||
Token::Mul | Token::Div | Token::Mod => MUL_DIV_MOD_OP_PREC, | ||
Token::DoubleColon => DOUBLE_COLON_PREC, | ||
Token::LBracket => BRACKET_PREC, | ||
Token::Arrow | ||
| Token::LongArrow | ||
| Token::HashArrow | ||
| Token::HashLongArrow | ||
| Token::AtArrow | ||
| Token::ArrowAt | ||
| Token::HashMinus | ||
| Token::AtQuestion | ||
| Token::AtAt | ||
| Token::Question | ||
| Token::QuestionAnd | ||
| Token::QuestionPipe | ||
| Token::ExclamationMark | ||
| Token::Overlap | ||
| Token::CaretAt | ||
| Token::StringConcat | ||
| Token::Sharp | ||
| Token::ShiftRight | ||
| Token::ShiftLeft | ||
| Token::Pipe | ||
| Token::Ampersand | ||
| Token::CustomBinaryOperator(_) => PG_OTHER_PREC, | ||
_ => self.prec_unknown(), | ||
}; | ||
Some(Ok(precedence)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fear a little bit for the added maintenance burden here. What do you think @alamb ?
There are still many SQL tokens to add to sqlparser, and there is a chance people will add them only in the generic dialect, leaving postgres silently broken... Isn't there ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think dialect divergence is a possibly and likely made more so by this PR
In fact, in some ways the point of this PR is to start to diverge the dialects with more accurate precedence rules 🤔
The only way I know to guard against this divergence is with test coverage for all new features (which we are pretty good about in general).
Do you have any other thoughts about how to improve the situation @lovasoa ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could change postgres' get_next_precedence
to just call the generic dialect's get_next_precedence
and "fixing up" the result. This way, we remove the duplication, and when we change the generic dialect, the postgres version reflects the changes automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like
fn get_next_precedence(&self, parser: &Parser) -> Option<Result<u8, ParserError>> {
if some_particular_case { return some_particular_precedence }
GenericDialect.get_next_precedence(parser)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lovasoa I agree it's unfortunate, there's so much duplication in the the postgres dialect.
However your GenericDialect.get_next_precedence(parser)
solution don't help, we can already return None
from get_next_precedence
to fallback to the default behaviour.
The reason the Postgres dialect is completely reimplementing the logic is that it has to change the precedence values for virtually every token to match Postgres.
We can either:
- implement ~15 more methods on the trait to get every single precedence value
- or, add another abstraction on top of
Token
and replace all theprec_*
methods with a singleget_prec(&self, precedence_level: PrecedenceLevel)
These would avoid the need to duplicate the logic in Dialect::get_next_precedence_default
, but would introduce (albeit less) duplication elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented in #1379.
Follow on PR to update comments #1366 |
@@ -67,6 +85,102 @@ impl Dialect for PostgreSqlDialect { | |||
) | |||
} | |||
|
|||
fn get_next_precedence(&self, parser: &Parser) -> Option<Result<u8, ParserError>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb @samuelcolvin : wouldn't it be better to have a Result<Option<u8>>
instead of an Option<Result<u8>>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would:
- differ from the signature of
Iterator::next()
- be a breaking change to the API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think the transpose()
method can be used to convert Option<Result<..>>
to Result<Option<..>>
https://doc.rust-lang.org/std/option/enum.Option.html#method.transpose
I don't have any particular preference for one over the other
This is ugly but working, I wanted to get feedback before proceeding.I think this is now in a good state, definitely ready to review.This solves #814 (comment), e.g. it gives the right AST for
SELECT foo->'bar'='spam'
etc.But before I go further, I want to confirm a few things?
Can we update precedence globally?
The claim at
https://github.com/sqlparser-rs/sqlparser-rs/blob/cc13841a370190df00cfc623593453054ac187e9/src/parser/mod.rs#L2972
Isn't really true at all,
sqlparser-rs
doesn't use the lexical precedence from PostgreSQL 7, even if it did, that's extremely old.Could we just update those values to reference a more recent version?
I assume not since that order is presumably a not-too-bad approximation for lots of databases, and changing it would be a big breaking change
Can we even update
PostgresSQL
dialectPresumably people are relying on the current logic, will you even accept a PR to update
PostgreSqlDialect
even if it's wrong now?Or should a create a
RecentPostgreSqlDialect
as a new dialect?What about operators that don't work in some dialects?
Postgres doesn't actually support
XOR
, but there are tests asserting that it works, do you want to keep supporting it with an invented precedence, or cause an error?Can I move precedence numbers to the
Dialect
trait?The
Precedence
enum shown here is pretty ugly, would you mind me moving the precedence stuff, including the default implementation ofget_next_precedence
to theDialect
trait?