-
Notifications
You must be signed in to change notification settings - Fork 625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Initial move of op registry to base_sql #2469
WIP: Initial move of op registry to base_sql #2469
Conversation
Hello @matthewmturner! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2020-10-14 16:19:54 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @matthewmturner, what you mention in the description is correct.
Besides what you say, would be good if the operations that are not being used by any other backend than Impala are not moved to base sql, and stay in Impala. If that adds too much complexity to the PR, I guess it's ok to move everything (except the obvious only-Impala stuff, see comment).
Thanks for working on this!
ibis/backends/base_sql/__init__.py
Outdated
return 'unix_timestamp({})'.format(t.translate(arg)) | ||
|
||
|
||
_impala_unit_names = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This, and the next method, seem to be specific to Impala, we shouldn't move it to the base sql backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @datapythonista
tying back to my below comment to @jreback - will there be separate PRs for moving each backend to base_sql
(i.e. the focus of this PR is correctly setting it up and thats it)
If it is the case I would think items 2-4 from my list above would be handled in separate PRs.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate PRs are prob best, though also happy to move a 'bunch of stuff' then clean up later (it is slightly harder to remove stuff this way); whichever way easier / makes more clear what is changing.
another way to approach is to move a complete piece of functionaility, and from all backends at once, that way moving whole things and PRs are reasonable and clear what is happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving some functions at a time in different PRs sounds good. I think in this case may make things trickier, but if you leave the registry itself to the last PR can make sense.
But just to be clear. Updating one backend at a time (one PR for the imports of bigquery, another for clickhouse) means that we need to duplicate the functions in the Impala backend and in base_sql. I think we should avoid that. And the imports are very small, moving the code is the big part.
So, small PRs is fine, if splitted by different operators, but better let's not do one backend at a time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, makes sense. Thanks @jreback @datapythonista
So if i were to break it up by function type would it be good to do a separate PR per group within the operation_registry (although im not sure if this is contradictory to @datapythonista point of leaving registry itself to the last PR)?
The groups are (as identified by comments):
- Unary operations
- Unary aggregates
- String operations
- Timestamp operations
- Other operations
- RowNumber, and rank functions starts with 0 in Ibis-land
Still learning the structure of ibis so any guidance on approach is much appreciated!
Thx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned leaving the registry to the end so you don't need to import impala from base sql, which will create circular dependencies.
But feel free to do it anyway you feel like, I'd just avoid havibg the same function twice, which will create a mess.
Starting by the beginning of the registry, move the first 10 functions (or any number), open a PR, and then continue with the next 10 sounds like a reasonable thing to do. But up to you, I'd move stuff, and once you're in a consistent state and you want a break, just open a PR for what you've got.
ibis/backends/base_sql/__init__.py
Outdated
import math | ||
from io import StringIO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls remove from the inheriters of this (e.g. impala), otherwise you maybe importing things you don't need and/or missing things.
May want to move a very minimal amount, until we move move backends to use base_sql.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, will be doing that. I have only done the initial move from impala to base_sql so far, havent gone back to cleanup the inheriters yet.
Will there be separate issues/PRs for moving the other backends from impala to base_sql? So ill be leaving most of impala unchanged in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No please, I think that was some misunderstanding. Let's not make a copy of the code, move it here, and change the imports everywhere. Thanks!
@matthewmturner I think this has been superseded now, is it ok to close? Not a problem to keep it open if it's useful for you, but even after closed you can still check the diff, if that's being helpful. |
yup, no problem to close. |
First commit to move impala registry ops to base_sql.
@datapythonista can you check this out when you get the chance? Its my first commit for #2463
I moved over the
_operation_registry
and then all the attendant functions needed to implement it.Once you confirm this is in the right direction my understanding is the following are next steps:
impala.compiler
base_sql
instead ofimpala.compiler
as neededCan you confirm / advise?
Thanks!