Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix boolean literals #357

Merged
merged 1 commit into from
Mar 14, 2024
Merged

Fix boolean literals #357

merged 1 commit into from
Mar 14, 2024

Conversation

aholyoke
Copy link
Contributor

Reproducing the issue:

import sqlalchemy
from sqlalchemy import select

engine = sqlalchemy.create_engine(
    f"databricks://token:{os.environ['DATABRICKS_TOKEN']}@{os.environ['DATABRICKS_SERVER_HOSTNAME']}:443/default",
    connect_args={
        "http_path": os.environ["DATABRICKS_HTTP_PATH"],
    },
)
connection = engine.connect()
result = connection.execute(select(True, False))

Go to Query History in the workspace. See that it executes: SELECT 1 AS anon_1, 0 AS anon_2

This becomes a problem when trying to compare to a boolean column:

In [11]: from sqlalchemy import MetaData, Column, Table, Boolean
    ...: metadata_obj = MetaData()
    ...: table = Table("table_with_bool", metadata_obj, Column("value", Boolean()))
    ...: expression = select(1).select_from(table).where(table.c.value == True)
    ...: print(str(expression.compile(dialect=engine.dialect)))
SELECT 1
FROM table_with_bool
WHERE table_with_bool.value = 1

In Databricks this produces the error:
[DATATYPE_MISMATCH.BINARY_OP_DIFF_TYPES] Cannot resolve "(value = 1)" due to data type mismatch: the left and right operands of the binary operator have incompatible types ("BOOLEAN" and "INT").; line 3 pos 6

SQLAlchemy has a flag called supports_native_boolean that controls this behaviour and is set to false by default. Setting this flag to true starts rendering booleans correctly:

In [12]: class FixedDialect(DatabricksDialect):
    ...:     supports_native_boolean: bool = True
    ...:
    ...:
    ...: metadata_obj = MetaData()
    ...: table = Table("table_with_bool", metadata_obj, Column("value", Boolean()))
    ...: expression = select(1).select_from(table).where(table.c.value == True)
    ...: print(str(expression.compile(dialect=FixedDialect())))
SELECT 1
FROM table_with_bool
WHERE table_with_bool.value = true

At growthloop.com we've been using this fix in production for about a month and haven't had any issues.

I'm confused as to why some of the tests are failing on main when the description of _regression.py in README.tests.md implies tests in this file should be passing, maybe I'm running them incorrectly? Nevertheless, this change seems to be fixing a couple of the reusable dialect tests.
Before:

test/_regression.py::BooleanTest_databricks+databricks::test_null PASSED                 [ 25%]
test/_regression.py::BooleanTest_databricks+databricks::test_render_literal_bool FAILED  [ 50%]
test/_regression.py::BooleanTest_databricks+databricks::test_round_trip PASSED           [ 75%]
test/_regression.py::BooleanTest_databricks+databricks::test_whereclause FAILED          [100%]

After:

test/_regression.py::BooleanTest_databricks+databricks::test_null PASSED                 [ 25%]
test/_regression.py::BooleanTest_databricks+databricks::test_render_literal_bool PASSED  [ 50%]
test/_regression.py::BooleanTest_databricks+databricks::test_round_trip PASSED           [ 75%]
test/_regression.py::BooleanTest_databricks+databricks::test_whereclause PASSED          [100%]

Signed-off-by: Alex Holyoke <alexander.holyoke@growthloop.com>
@aholyoke
Copy link
Contributor Author

@susodapop Can you take a look at this when you get a chance?

@yunbodeng-db
Copy link
Contributor

@susodapop Can you take a look at this when you get a chance?

@andrefurlan-db @benc-db @jackyhu-db can you take a look? And run the integration tests of our own before approving merge?

@susodapop
Copy link
Contributor

Hey @aholyoke thanks for the ping. I'm not working on this repository anymore so what I can do is limited but you are welcome to ping me with questions about the sqlalchemy dialect any time.

What version of SQLAlchemy are you running?

Here's what I can tell you and the Databricks team:

First, consider the tests that are failing. Notice that test_round_trip is passing even before @aholyoke's change. But test_render_literal_bool fails. That's because SQLAlchemy runs this test in two ways: the one that passes is using native parameters (i.e. the parameter value is being bound at the server). The one that fails does so when SQLAlchemy performs its own parameter inlining, which is separate from the connector's own use_inline_params functionality.

The change proposed by @aholyoke in this PR is a good one as it simply instructs SQLAlchemy to render the string literals true and false rather than 1 and 0.

I'm confused as to why some of the tests are failing on main when the description of _regression.py in README.tests.md implies tests in this file should be passing

You understand _regression.py correctly. The test classes in this file should always pass. The BooleanTest was passing the last time I ran this suite against the Databricks internal testing environment. Since I don't have a test environment anymore I can't dig deeper into why this passed before or why it would now be failing. But assuming that it does fail, it should be fixed.

If I had reviewing privileges on this PR I would approve this, but request one change: Add a unit test for this compilation that mirrors your exact reproduction steps. Here's a patch that does this (if you add me as an editor on your branch I can also just push this to it so it's bundled into your PR):

diff --git a/src/databricks/sqlalchemy/test_local/test_ddl.py b/src/databricks/sqlalchemy/test_local/test_ddl.py
index a83ff24..870b4c8 100644
--- a/src/databricks/sqlalchemy/test_local/test_ddl.py
+++ b/src/databricks/sqlalchemy/test_local/test_ddl.py
@@ -1,5 +1,5 @@
 import pytest
-from sqlalchemy import Column, MetaData, String, Table, create_engine
+from sqlalchemy import Column, MetaData, String, Table, create_engine, select
 from sqlalchemy.schema import (
     CreateTable,
     DropColumnComment,
@@ -93,3 +93,18 @@ class TestTableCommentDDL(DDLTestBase):
         stmt = DropTableComment(table_with_comment)
         output = self.compile(stmt)
         assert output == "COMMENT ON TABLE martin IS NULL"
+
+
+class TestBooleanLiteralDDL(DDLTestBase):
+
+    @pytest.mark.parametrize(
+        "select_input, expected_output",
+        [
+            ((True,), "select true"),
+            ((False,), "select false"),
+        ],
+    )
+    def test_literal_rendering(self, select_input, expected_output):
+        stmt = select(*select_input)
+        compiled = self.compile(stmt)
+        assert expected_output in compiled.lower()

@susodapop
Copy link
Contributor

Also @aholyoke are you able to share the exact error message you see when test_render_literal_bool fails? Looking through the test definition, I'm actually a little surprised that it's failing at all. Here's the test definition:

    def test_render_literal_bool(self, literal_round_trip):
        literal_round_trip(Boolean(), [True, False], [True, False])

literal_round_trip is a fixture built-in to the SQLAlchemy test suite. It writes a SELECT statement and passes it Python types that will be rendered as string literals. Then it reads the return value of the select and compares the data received to the data sent.

  • Boolean() is the SQLAlchemy type mapping that will be used for the test. It will use sqlalchemy.types.Boolean to render the value in the query and to marshal the value returned by the query
  • [True, False] are the values that will be passed to Boolean() for rendering during the emission of the SELECT
  • [True, False] are the expected return values from the subsequent SELECT

The Boolean() type prior to your change would write True as 1 and False as 0. Which is what causes the type mismatch error you see from Databricks when using .where(some_col == True) in your queries. But I don't think that would be detectable in this test because sqlalchemy.types.Boolean() interprets 1 as True and 0 as False when supports_native_boolean is disabled.

In other words: this test should pass before your change. Thus my question at the top of this comment.

I don't have a databricks environment I can run tests against anymore so I'm pawing around in the dark.

@benc-db
Copy link
Collaborator

benc-db commented Mar 14, 2024

Unit tests pass, trying integration now.

@benc-db benc-db merged commit c103352 into databricks:main Mar 14, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants