Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: Make bigquery backend not depend on impala backend (part 1) #2449

Merged
merged 11 commits into from
Oct 12, 2020
Merged

CLN: Make bigquery backend not depend on impala backend (part 1) #2449

merged 11 commits into from
Oct 12, 2020

Conversation

datapythonista
Copy link
Contributor

The bigquery backend is currently reusing a lot of functionality from the impala backend, which IMO doesn't make any sense.

The reason is that the impala backend compiler has many things that are generic. I guess it was the only backend when it was implemented, and when bigquery was implemented, instead of moving the generic functionality to a base module for all backends to import, these generic functionality was simply imported from impala.

Not sure why this hack hasn't created more problems until now, but we're facing some in #2435.

This PR creates a base_backend.py with the shared functionality. A follow up is needed to also move the common _operation_registry and its dependencies. This PR doesn't change any code, just moves functions.

CC: @jreback @tswast

@datapythonista datapythonista added refactor Issues or PRs related to refactoring the codebase impala The Apache Impala backend backends - bigquery labels Oct 6, 2020
@jreback
Copy link
Contributor

jreback commented Oct 6, 2020

should be sql_backend or just sql

@datapythonista
Copy link
Contributor Author

should be sql_backend or just sql

I guess you mean the name of the base_backend.py file. The problem I see is that we have sql/ for the sqlalchemy stuff. I can use sql_backend, but still a bit ambiguous I would say. I think some more refactoring will be needed after this, to have all the shared backend stuff in one place (I was thinking on backends/base, or backends/base_sql + backends/base_sqlalchemy). Happy to change it if you think it's important, but I think base_backend is good enough for now, since there is still a significant mess on the shared backend functionality.

@jreback
Copy link
Contributor

jreback commented Oct 6, 2020

backends/base, or backends/base_sql + backends/base_sqlalchemy

these are better; you already have a naming scheme going keep using it

its not an entry point but ibis.backends.base_backend is very weird

@datapythonista
Copy link
Contributor Author

its not an entry point but ibis.backends.base_backend is very weird

It's ibis.base_backend, but fair enough. Am I understanding correctly that backends/base_sql is ok? I didn't do it initially because I see backends/base_* having an ABC to inherit from. But probably worth having the code in place even if the ABC is not yet implemented.

@datapythonista
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

This is very useful. Eventually I'd even like to add a base_zetasql once I get around to supporting Apache Beam SQL and/or Cloud Spanner, which all use the same syntax https://github.com/google/zetasql It's great to have some precedence for how to do this.

@datapythonista
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@datapythonista
Copy link
Contributor Author

azp run

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment. any ci / testing changes that work now that didn't work before?

@@ -24,6 +24,7 @@
import math

import ibis
import ibis.backends.base_sql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use like
import ibis.backends.base_sql as bsql or similar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally find using the name of the module much easier to read. Currently we're using (sample from impala/compiler.py):

import ibis.common.exceptions as com
import ibis.expr.analysis as L 
import ibis.expr.datatypes as dt
import ibis.expr.operations as ops
import ibis.expr.types as ir
import ibis.sql.compiler as comp

And seems like we're making our life very difficult for no reason. I'd rather keep the original module name, I find the code much clearer than keep adding aliases for everything.

@jreback jreback added this to the Next Feature Release milestone Oct 7, 2020
@jreback
Copy link
Contributor

jreback commented Oct 9, 2020

let's follow the current pattern

it's not idiomatic to use the full import

happy to entertain a change - but it should b comprehensive when all of the moving parts are finished (all of this moving of backends

@datapythonista
Copy link
Contributor Author

it's not idiomatic to use the full import

Why do you mean by that? My understanding is that what's Pythonic or idiomatic in Python is being explicit, simple and readable (I think those are the main takeaways from the zen of Python).

If I compare:

ir.CategoryValue

with

ibis.expr.types.CategoryValue

I think the latter is explicit, simpler and more readable (just more verbose too). I think for anyone new to Ibis (and not so new, like myself) having to learn all those alias, which don't even seem to make sense, is useless and a waste of time. If ibis.expr.types.CategoryValue, maybe CategogyValue should be in ibis.types instead, but placing things in a place and then aliasis and obfuscating where they are is not what I would call idiomatic.

We are moving a lot of stuff to backends/ now, seems like a great time to start doing things better, at least for that part of the code. And we're not creating aliases for any other backend (e.g. import ibis.backends.impala as imp) as far as I know.

So, -1 on aliasing the module. Please let me know if there is any other reason I'm missing for the alias to be preferred.

@jreback
Copy link
Contributor

jreback commented Oct 9, 2020

@datapythonista

my point is this is competely changing the current ibis style. you may not like it but changing ad-hoc on this is a big no-no.

further I don't find realy long imports useful at all, that'st he reason they are aliased.

@jreback
Copy link
Contributor

jreback commented Oct 9, 2020

We are moving a lot of stuff to backends/ now, seems like a great time to start doing things better, at least for that part of the code. And we're not creating aliases for any other backend (e.g. import ibis.backends.impala as imp) as far as I know.

if you want to clean all backends as a followon not averse, but please don't do this here.

@datapythonista
Copy link
Contributor Author

if you want to clean all backends as a followon not averse, but please don't do this her

Ok, I'll do that. In any case, we don't have an alias for any backend. Other backends are using the form from ibis.backends.base_sql import quote_identifier, literal.

I'll do that, even if I don't think I should be wasting time writing suboptimal code, when I already wrote the clean up of those imports and made the code clearer in this PR, which is a reafactoring of that exact part, and IMHO the best place to fix it.

As a side note, probably worth mentioning that I'm not happy with this policy of blocking PRs for very opinionated things. I see value in (discussing and) standardizing styles and best practices. But if every small detail in this projects needs to be based on your personal preference (and I do feel this way here and in #2451), I'll have to think if this is a good investment of my time working on Ibis. Cleaning the big mess in the Ibis CI and architecture is a lot of hard work. And if instead of having help, I've got impediments and micromanagement, I think I'll leave someone else to do it.

@jreback
Copy link
Contributor

jreback commented Oct 9, 2020

if you want to clean all backends as a followon not averse, but please don't do this her

Ok, I'll do that. In any case, we don't have an alias for any backend. Other backends are using the form from ibis.backends.base_sql import quote_identifier, literal.

I'll do that, even if I don't think I should be wasting time writing suboptimal code, when I already wrote the clean up of those imports and made the code clearer in this PR, which is a reafactoring of that exact part, and IMHO the best place to fix it.

As a side note, probably worth mentioning that I'm not happy with this policy of blocking PRs for very opinionated things. I see value in (discussing and) standardizing styles and best practices. But if every small detail in this projects needs to be based on your personal preference (and I do feel this way here and in #2451), I'll have to think if this is a good investment of my time working on Ibis. Cleaning the big mess in the Ibis CI and architecture is a lot of hard work. And if instead of having help, I've got impediments and micromanagement, I think I'll leave someone else to do it.

@datapythonista your efforts are appreciated, but at the same time, you are introducing a huge amount of change in a short period of time. i am fine with doing things as followons, but my point in review is to have consistency in things.

@jreback
Copy link
Contributor

jreback commented Oct 9, 2020

from ibis.backends.base_sql import quote_identifier, literal.
I would prefer this as well btw

@jreback
Copy link
Contributor

jreback commented Oct 9, 2020

just don't use the fully qualified form itself in the code, that is much longer.

@jreback jreback merged commit 8d81eb1 into ibis-project:master Oct 12, 2020
@jreback
Copy link
Contributor

jreback commented Oct 12, 2020

thanks @datapythonista

I suspect some backends that were not originally using impala, e.g. postgres likely could be refactored to use the new base_sql (and maybe push down some more things). I don't know if we have an issue for this, but could create one if not.

@datapythonista
Copy link
Contributor Author

I suspect some backends that were not originally using impala, e.g. postgres likely could be refactored to use the new base_sql (and maybe push down some more things). I don't know if we have an issue for this, but could create one if not.

You'll know better than me, but my understanding is that some backends depend on the custom compiler base_sql, and some other on the sqlalchemy compiler. I'm not familiar enough with the code to tell for sure, but to me what would be ideal is that all the SQL backends are implemented as sqlalchemy engines, and that sqlalchemy is a generic backend.

For now I'll create issues to finish this (move what it's still in Impala), and to move ibis/sql to ibis/backends/sqlalchemy is that makes sense to you.

matthewmturner added a commit to matthewmturner/ibis that referenced this pull request Oct 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impala The Apache Impala backend refactor Issues or PRs related to refactoring the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants