Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use Pandas for SQLTableCheckOperator #25822

Merged
merged 1 commit into from
Aug 19, 2022

Conversation

ashb
Copy link
Member

@ashb ashb commented Aug 19, 2022

Pandas is an optional extra for common-sql provider, so forcing it for
a query that is going to return a couple of rows is not a good idea.

(We didn't notice this in CI as our tests have everyting installed)


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@ashb
Copy link
Member Author

ashb commented Aug 19, 2022

/cc @eladkal @potiuk @denimalpaca

@@ -385,17 +385,15 @@ def execute(self, context: 'Context'):
self.sql = f"SELECT check_name, check_result FROM ({checks_sql}) "
f"AS check_table {partition_clause_statement};"

records = hook.get_pandas_df(self.sql)
records = hook.get_records(self.sql)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you forget to commit something? Tests fail:

> records = hook.get_records(self.sql)
E AttributeError: 'MockHook' object has no attribute 'get_records'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I just failed to run the rest of the tests (I only checked my new test that hits a real DB that I added in a paralell PR not the mocked one.)

Pandas is an optional extra for common-sql provider, so _forcing_ it for
a query that is going to return a couple of rows is not a good idea
@ashb ashb force-pushed the no-pandas-for-check-sql-operator branch from 3acfa81 to 61a2d66 Compare August 19, 2022 13:47
@ashb ashb requested a review from norm August 19, 2022 13:48
Copy link
Contributor

@norm norm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the tests pass, LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants