Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: fix meta/has_table #6627

Closed

Conversation

jorisvandenbossche
Copy link
Member

Further work for cleaning up the sql code (#6292).

  • the meta attribute is not updated automatically, with the consequence that eg when you delete a table from sql directly, the has_table function does not work anymore:
    • I added a test for that
    • I converted the meta attribute to one which is always updated when called. @mangecoeur looks OK?
  • now the tests are skipped when no connection could be made. @jreback, I just did a raise nose.SkipTest inside the setup function of the test class. -> moved and merged this in TST: skip sql tests if connection to server fails #6651

try:
import pymysql
self.driver = 'pymysql'
def setUp(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you want to have a test setup() method to install modules and connect (then can define the setup_imports() and setup_connection() or something in the sub class of the test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am no sure I fully understand what you mean.
Something like this? (just to clean up the code)

class _TestSQLAlchemy(PandasSQLTest):
    def set_up(self):
        setup_import()
        setup_driver()
        setup_connect()

    def setup_import():
        if not SQLALCHEMY_INSTALLED:
            raise nose.SkipTest('SQLAlchemy not installed')

    def setup_connect():
        try:
            self.conn = self.connect()
            self.pandasSQL = sql.PandasSQLAlchemy(self.conn)
        except sqlalchemy.exc.OperationalError:
            raise nose.SkipTest("Can't connect to server")

class TestPostgreSQLAlchemy(_TestSQLAlchemy):
    def setup_driver()
        try:
            import psycopg2
            self.driver = 'psycopg2'
        except ImportError:
            raise nose.SkipTest('psycopg2 not installed')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep that's what I meant

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, ik keep this in mind for a pr to clean up the tests

@mangecoeur
Copy link
Contributor

@jorisvandenbossche I'm not 100% sure it's a good idea to always clear and reflect when you need Meta, in cases where you have a lot of tables it can take a few seconds. On the other hand you are right that if someone starts mixing in raw SQL (which i guess is very likely) we have no way of know whether Meta needs updating or not :/ I guess we can try it as a property and dive into potential performance issues later

@jreback
Copy link
Contributor

jreback commented Mar 13, 2014

can you invalidate the metadata _meta? when a raw string is passed? then only need to recreate if the user does raw stuff

@jorisvandenbossche
Copy link
Member Author

@mangecoeur I agree it seems not the best idea to always clear and reflect the meta, but I do not see another way to prevent that the meta will not be up to date anymore if one has executed some raw SQL code.

@jreback But the problem is that we don't know when the user does raw stuff. They could do it with sql.execute("DROP TABLE table"), and this we could maybe 'detect', but you can also do it via sqlalchemy of directly in eg PgAdminIII for PostgreSQL.

Maybe another way to prevent the possible performance issues is to limit the cases that meta is called in our code. Eg for has_table, sqlalchemy has also an has_table method directly on the engine that maybe has not to pass through meta.

@jreback
Copy link
Contributor

jreback commented Mar 14, 2014

what happens if you only reflect the meta once
if user does raw stuff then they should manually reflect if they are then using pandas functions
(or maybe a flag to do it every time)

I think this is users issue

@jorisvandenbossche
Copy link
Member Author

It is indeed maybe a user issue, but I think a common issue (as we already did it ourselves in the test suite (https://github.com/jorisvandenbossche/pandas/blob/sql2/pandas/io/tests/test_sql.py#L109), why one of the tests was not really working), so if we can easily prevent it, why not?

But I can see an argument to leave this as is, as it is the same with sqlalchemy, and we would just copy sqlalchemy behaviour then.
If we do that, i would certainly change has_table so this is up to date. So it would only be the direct use of PandasSQL.meta that would not always be up to date.

… has_table

Because meta is not automatically updated, has_table will fail when a table
is dropped with an sql query (not with drop_table function)
@jorisvandenbossche
Copy link
Member Author

@mangecoeur @hayd @jreback Anybody a strong opinion on this?

@jreback
Copy link
Contributor

jreback commented Mar 16, 2014

I think should be a passed in option when creating the engine defaulting to True is ok
maybe meta='always', and accept None which will allow the user to manually reflect (maybe should make reflect/clear a public method)

@jreback jreback added the SQL label Mar 22, 2014
@jreback jreback added this to the 0.14.0 milestone Mar 22, 2014
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.14.1, 0.14.0 May 16, 2014
@jorisvandenbossche
Copy link
Member Author

Pushed this to 0.14.1, as the OO api is also not yet finalized (and this only concerns that, not the functional api)

@mangecoeur
Copy link
Contributor

On refelction I like the idea of invalidating meta if the user does something "manual" like using raw SQL, but leave scope open for adding more useful functions that take advantage of SQLs expression API (or these could be added by people/projects who want to subclass the SQLAlchemy api class). Like I mentioned in another issue, it's important to make it possible for a user to manually manage reflection of metadata/supply their own metadata object for advanced use cases.

@hayd
Copy link
Contributor

hayd commented May 28, 2014

Perhaps raise a UserWarning if they have done something manual? Say that meta could be invalid / user should reflect ?

@mangecoeur
Copy link
Contributor

@hayd I think that might make it too easy for people to shoot themselves in the foot. Perhaps the best is to always reflect but have a class "reflect" or "get meta" function. An advanced user can just create a subclass and override that function.
OR we could think of a clean way to pass SQLAlchemy options into the object - preferably without spamming dozens of kwargs...

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.15.0, 0.14.1 Jul 1, 2014
@jreback
Copy link
Contributor

jreback commented Jan 25, 2015

@jorisvandenbossche keep this open?

@jreback
Copy link
Contributor

jreback commented Mar 5, 2015

@jorisvandenbossche ?

@jorisvandenbossche jorisvandenbossche modified the milestones: No action, 0.16.0 Mar 6, 2015
@jorisvandenbossche
Copy link
Member Author

Closing for now. In any case not applicable as is, but I should have a closer look about how meta is handled now (things changed somewhat).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants