-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(pinot): The python_date_format
for a temporal column was not being passed to get_timestamp_expr
#24942
Changes from 8 commits
d60873e
2367233
78d43a9
0fd9b9e
1301e22
ccdfe8d
c83dc2b
b7d494a
912fc24
61ddf51
01a041c
52268a1
f9ef8e4
8fa3524
e1c2a75
31d36cf
8a9ed76
81b8801
6e51f04
0042027
399abf7
15fc982
1e5cfc9
4c6af11
88f3f23
45d5c60
6c779ee
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -78,7 +78,9 @@ def get_timestamp_expr( | |
time_grain: Optional[str], | ||
) -> TimestampExpression: | ||
if not pdf: | ||
# If there is no python date format (pdf) given then we cannot determine how to correctly handle the timestamp | ||
raise NotImplementedError(f"Empty date format for '{col}'") | ||
|
||
is_epoch = pdf in ("epoch_s", "epoch_ms") | ||
|
||
# The DATETIMECONVERT pinot udf is documented at | ||
|
@@ -99,12 +101,13 @@ def get_timestamp_expr( | |
else: | ||
seconds_or_ms = "MILLISECONDS" if pdf == "epoch_ms" else "SECONDS" | ||
tf = f"1:{seconds_or_ms}:EPOCH" | ||
|
||
if time_grain: | ||
granularity = cls.get_time_grain_expressions().get(time_grain) | ||
if not granularity: | ||
raise NotImplementedError(f"No pinot grain spec for '{time_grain}'") | ||
else: | ||
return TimestampExpression("{{col}}", col) | ||
return TimestampExpression("{col}", col) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would you also mind adding some unit tests for Pinot which cover the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, not a problem. |
||
|
||
# In pinot the output is a string since there is no timestamp column like pg | ||
if cls._use_date_trunc_function.get(time_grain): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that this change may actually break existing logic—given it was explicitly set to
None
. Would you mind adding a unit test for this which helps not just to provide code coverage, but also helps reviewers et al. grok the consequence of the change.@zhaoyongjie it seems like you added this logic in #21163 and thus you probably have the most context as to why we historically weren't defining the
pdf
variable.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, in my testing whenever I tried to create a chart or use a dashboard, if a column was marked as
temporal
it would always callget_timestamp_expr
viaadhoc_column_to_sqla
which means that the user defined date format is never passed to the DB Engine Spec.It's possible that the root cause of the issue is that
get_timestamp_expr
is being called throughadhoc_column_to_sqla
which it should be getting called viaTableColumn.get_timestamp_expression
(the only other call path toget_timestamp_expr
I could find. But all my tests pointed toadhoc_column_to_sqla
being the root cause.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@john-bodley the "pdf" is a shortcut for "date format (seconds or milliseconds)", this code was existing in many years, the "pdf" only used in Calculated Column and Columns from database, but not used in Adhoc expression, so we shouldn't make this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ege-st
The design of current Pinot DB spec is completely incorrect. Maintaining our own Pinot driver and db_spec should solve your issue.
driver at: https://github.com/BurdaForward/pinot-dbapi/tree/bf_release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhaoyongjie I'm aware of what the Python Date Format (PDF) represents, though thanks for clarifying that this shouldn't be used for ad-hoc expressions.
Note we do already have a Pino DB engine spec, but maybe only adding the
column_datatype_to_string
method is required.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ege-st I also wondered if this was an underlying issue with the Pino SQLAlchemy dialect. You might want to look into the
visit_label
method.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhaoyongjie I've confirmed that this error happens with the latest versions of Pinot: so Superset can't alias a projection to the same name as a column that already exists. I looked at the diff you provided but it appears to be diffing a version of
models.py
that is not the same as the one in themaster
branch.@john-bodley could you provide some more detail? Is SQL Alchemy generating the alias name used in the projection? If so, then it could be an issue with the dialect, but if Superset generates the alias label then I'm not sure how the dialect can address this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ege-st the git-diffs are from Superset 2.1.0 branch. There aren't many changes, so you should apply the changes manually 🖨️🖨️🖨️
You should change this part of code on Master branch
superset/superset/models/core.py
Lines 965 to 979 in cacad56
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhaoyongjie so I believe I figured out a workaround for the alias issue. If I just set
allows_alias_in_select = False
then the query generated by Superset does not use an alias and the query is then compatible with Pinot. So, I don't think any of the additional changes you kindly suggested are necessary.One question that I have is: what is the purpose of the
pdf
that gets defined in the dataset configuration? Since it isn't passed into the engine spec when creating a chart, it can't be used in the query generation, so it doesn't seem to serve a purpose?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ege-st
Sounds good! It should be worked.
I think the original design of "pdf" is a hard-code for getting a timestamp from a string, but a type conversion expression is more graceful, --- should push down the function and run in DB rather than calculate in client.