Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Initial move of op registry to base_sql #2469

Conversation

matthewmturner
Copy link
Contributor

First commit to move impala registry ops to base_sql.

@datapythonista can you check this out when you get the chance? Its my first commit for #2463

I moved over the _operation_registry and then all the attendant functions needed to implement it.

Once you confirm this is in the right direction my understanding is the following are next steps:

  1. Remove underscores to make functions public
  2. Remove moved functions from impala.compiler
  3. Import the moved registry operations to impala
  4. Update other backends to use base_sql instead of impala.compiler as needed

Can you confirm / advise?

Thanks!

@pep8speaks
Copy link

pep8speaks commented Oct 14, 2020

Hello @matthewmturner! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 20:1: F401 'ibis.impala.compiler._truncate' imported but unused
Line 829:5: F841 local variable 'p' is assigned to but never used
Line 829:8: F841 local variable 'f' is assigned to but never used
Line 1082:28: F821 undefined name 'truncate'
Line 1083:23: F821 undefined name 'truncate'

Comment last updated at 2020-10-14 16:19:54 UTC

Copy link
Contributor

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @matthewmturner, what you mention in the description is correct.

Besides what you say, would be good if the operations that are not being used by any other backend than Impala are not moved to base sql, and stay in Impala. If that adds too much complexity to the PR, I guess it's ok to move everything (except the obvious only-Impala stuff, see comment).

Thanks for working on this!

return 'unix_timestamp({})'.format(t.translate(arg))


_impala_unit_names = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, and the next method, seem to be specific to Impala, we shouldn't move it to the base sql backend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @datapythonista

tying back to my below comment to @jreback - will there be separate PRs for moving each backend to base_sql (i.e. the focus of this PR is correctly setting it up and thats it)

If it is the case I would think items 2-4 from my list above would be handled in separate PRs.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate PRs are prob best, though also happy to move a 'bunch of stuff' then clean up later (it is slightly harder to remove stuff this way); whichever way easier / makes more clear what is changing.

another way to approach is to move a complete piece of functionaility, and from all backends at once, that way moving whole things and PRs are reasonable and clear what is happening.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving some functions at a time in different PRs sounds good. I think in this case may make things trickier, but if you leave the registry itself to the last PR can make sense.

But just to be clear. Updating one backend at a time (one PR for the imports of bigquery, another for clickhouse) means that we need to duplicate the functions in the Impala backend and in base_sql. I think we should avoid that. And the imports are very small, moving the code is the big part.

So, small PRs is fine, if splitted by different operators, but better let's not do one backend at a time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, makes sense. Thanks @jreback @datapythonista

So if i were to break it up by function type would it be good to do a separate PR per group within the operation_registry (although im not sure if this is contradictory to @datapythonista point of leaving registry itself to the last PR)?

The groups are (as identified by comments):

  • Unary operations
  • Unary aggregates
  • String operations
  • Timestamp operations
  • Other operations
  • RowNumber, and rank functions starts with 0 in Ibis-land

Still learning the structure of ibis so any guidance on approach is much appreciated!

Thx.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned leaving the registry to the end so you don't need to import impala from base sql, which will create circular dependencies.

But feel free to do it anyway you feel like, I'd just avoid havibg the same function twice, which will create a mess.

Starting by the beginning of the registry, move the first 10 functions (or any number), open a PR, and then continue with the next 10 sounds like a reasonable thing to do. But up to you, I'd move stuff, and once you're in a consistent state and you want a break, just open a PR for what you've got.

@datapythonista datapythonista changed the title Initial move of op registry to base_sql WIP: Initial move of op registry to base_sql Oct 14, 2020
import math
from io import StringIO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls remove from the inheriters of this (e.g. impala), otherwise you maybe importing things you don't need and/or missing things.

May want to move a very minimal amount, until we move move backends to use base_sql.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, will be doing that. I have only done the initial move from impala to base_sql so far, havent gone back to cleanup the inheriters yet.

Will there be separate issues/PRs for moving the other backends from impala to base_sql? So ill be leaving most of impala unchanged in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No please, I think that was some misunderstanding. Let's not make a copy of the code, move it here, and change the imports everywhere. Thanks!

@jreback jreback added the sqlalchemy SQLAlchemy-based backends label Oct 14, 2020
@datapythonista
Copy link
Contributor

@matthewmturner I think this has been superseded now, is it ok to close? Not a problem to keep it open if it's useful for you, but even after closed you can still check the diff, if that's being helpful.

@matthewmturner
Copy link
Contributor Author

@matthewmturner I think this has been superseded now, is it ok to close? Not a problem to keep it open if it's useful for you, but even after closed you can still check the diff, if that's being helpful.

yup, no problem to close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqlalchemy SQLAlchemy-based backends
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants