Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for save_table(..., mode="overwrite") to StatementExecutionBackend #74

Merged
merged 1 commit into from
Mar 27, 2024

Conversation

william-conti
Copy link
Contributor

No description provided.

Copy link

codecov bot commented Mar 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.58%. Comparing base (8921e0f) to head (77fe395).

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #74      +/-   ##
==========================================
+ Coverage   82.95%   84.58%   +1.62%     
==========================================
  Files           7        7              
  Lines         534      532       -2     
  Branches      105      105              
==========================================
+ Hits          443      450       +7     
+ Misses         55       50       -5     
+ Partials       36       32       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

❌ 17/18 passed, 1 failed, 2 skipped, 11m29s total

❌ test_overwrite: databricks.sdk.errors.platform.BadRequest: [INSUFFICIENT_PERMISSIONS] Insufficient privileges: (2.587s)
databricks.sdk.errors.platform.BadRequest: [INSUFFICIENT_PERMISSIONS] Insufficient privileges:
User does not have permission CREATE,USAGE on database `default`.
11:01 DEBUG [databricks.sdk] Loaded from environment
11:01 DEBUG [databricks.sdk] Ignoring pat auth, because metadata-service is preferred
11:01 DEBUG [databricks.sdk] Ignoring basic auth, because metadata-service is preferred
11:01 DEBUG [databricks.sdk] Attempting to configure auth: metadata-service
11:01 INFO [databricks.sdk] Using Databricks Metadata Service authentication
[gw9] linux -- Python 3.10.13 /home/runner/work/lsql/lsql/.venv/bin/python
11:01 DEBUG [databricks.sdk] Loaded from environment
11:01 DEBUG [databricks.sdk] Ignoring pat auth, because metadata-service is preferred
11:01 DEBUG [databricks.sdk] Ignoring basic auth, because metadata-service is preferred
11:01 DEBUG [databricks.sdk] Attempting to configure auth: metadata-service
11:01 INFO [databricks.sdk] Using Databricks Metadata Service authentication
11:01 DEBUG [databricks.labs.lsql.backends] [api][execute] CREATE TABLE IF NOT EXISTS default.foo (first STRING NOT NULL, second BOOLEAN NOT NULL) USING DE... (3 more bytes)
11:01 DEBUG [databricks.labs.lsql.core] Executing SQL statement: CREATE TABLE IF NOT EXISTS default.foo (first STRING NOT NULL, second BOOLEAN NOT NULL) USING DELTA
11:01 DEBUG [databricks.sdk] POST /api/2.0/sql/statements/
> {
>   "format": "JSON_ARRAY",
>   "statement": "CREATE TABLE IF NOT EXISTS default.foo (first STRING NOT NULL, second BOOLEAN NOT NULL) USING DE... (3 more bytes)",
>   "warehouse_id": "TEST_DEFAULT_WAREHOUSE_ID"
> }
< 200 OK
< {
<   "statement_id": "01eeec29-68e4-1703-9a59-272346f4700f",
<   "status": {
<     "error": {
<       "error_code": "BAD_REQUEST",
<       "message": "[INSUFFICIENT_PERMISSIONS] Insufficient privileges:\nUser does not have permission CREATE,USAGE o... (21 more bytes)"
<     },
<     "state": "FAILED"
<   }
< }
11:01 DEBUG [databricks.sdk] Loaded from environment
11:01 DEBUG [databricks.sdk] Ignoring pat auth, because metadata-service is preferred
11:01 DEBUG [databricks.sdk] Ignoring basic auth, because metadata-service is preferred
11:01 DEBUG [databricks.sdk] Attempting to configure auth: metadata-service
11:01 INFO [databricks.sdk] Using Databricks Metadata Service authentication
11:01 DEBUG [databricks.labs.lsql.backends] [api][execute] CREATE TABLE IF NOT EXISTS default.foo (first STRING NOT NULL, second BOOLEAN NOT NULL) USING DE... (3 more bytes)
11:01 DEBUG [databricks.labs.lsql.core] Executing SQL statement: CREATE TABLE IF NOT EXISTS default.foo (first STRING NOT NULL, second BOOLEAN NOT NULL) USING DELTA
11:01 DEBUG [databricks.sdk] POST /api/2.0/sql/statements/
> {
>   "format": "JSON_ARRAY",
>   "statement": "CREATE TABLE IF NOT EXISTS default.foo (first STRING NOT NULL, second BOOLEAN NOT NULL) USING DE... (3 more bytes)",
>   "warehouse_id": "TEST_DEFAULT_WAREHOUSE_ID"
> }
< 200 OK
< {
<   "statement_id": "01eeec29-68e4-1703-9a59-272346f4700f",
<   "status": {
<     "error": {
<       "error_code": "BAD_REQUEST",
<       "message": "[INSUFFICIENT_PERMISSIONS] Insufficient privileges:\nUser does not have permission CREATE,USAGE o... (21 more bytes)"
<     },
<     "state": "FAILED"
<   }
< }
[gw9] linux -- Python 3.10.13 /home/runner/work/lsql/lsql/.venv/bin/python

Running from acceptance #46

Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's take the risk

Comment on lines +155 to 157
if mode == "overwrite":
self.execute(f"TRUNCATE TABLE {full_name}")
for i in range(0, len(rows), self._max_records_per_batch):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'd need to properly support overwrites in the raw sql shape to keep the same'ish semantics as Spark:

  1. INSERT INTO {full_name}_tmp ...
  2. CREATE OR REPLACE TABLE {full_name} AS SELECT * FROM {full_name}_tmp
  3. DROP TABLE {full_name}_tmp

otherwise the failure of overwrite will leave the table in a corrupt state.

@nfx nfx changed the title Adding overwrite support for backends Add support for save_table(..., mode="overwrite") to StatementExecutionBackend Mar 27, 2024
@nfx nfx merged commit ca137d1 into main Mar 27, 2024
9 of 10 checks passed
@nfx nfx deleted the overwrite_support branch March 27, 2024 13:26
nfx added a commit that referenced this pull request Mar 27, 2024
* Added support for `save_table(..., mode="overwrite")` to `StatementExecutionBackend` ([#74](#74)). In this release, we've added support for overwriting a table when saving data using the `save_table` method in the `StatementExecutionBackend`. Previously, attempting to use the `overwrite` mode would raise a `NotImplementedError`. Now, when this mode is specified, the method first truncates the table before inserting the new rows. The truncation is done using the `execute` method to run a `TRUNCATE TABLE` SQL command. Additionally, we've added a new integration test, `test_overwrite`, to the `test_deployment.py` file to verify the new `overwrite` mode functionality. A new option, `mode="overwrite"`, has been added to the `save_table` method, allowing for the existing data in the table to be deleted and replaced with the new data being written. We've also added two new test cases, `test_statement_execution_backend_save_table_overwrite_empty_table` and `test_mock_backend_overwrite`, to verify the new functionality. It's important to note that the method signature has been updated to include a default value for the `mode` parameter, setting it to `append` by default. This change does not affect the functionality and only provides a more convenient default behavior for users of the method.
@nfx nfx mentioned this pull request Mar 27, 2024
nfx added a commit that referenced this pull request Mar 27, 2024
* Added support for `save_table(..., mode="overwrite")` to
`StatementExecutionBackend`
([#74](#74)). In this
release, we've added support for overwriting a table when saving data
using the `save_table` method in the `StatementExecutionBackend`.
Previously, attempting to use the `overwrite` mode would raise a
`NotImplementedError`. Now, when this mode is specified, the method
first truncates the table before inserting the new rows. The truncation
is done using the `execute` method to run a `TRUNCATE TABLE` SQL
command. Additionally, we've added a new integration test,
`test_overwrite`, to the `test_deployment.py` file to verify the new
`overwrite` mode functionality. A new option, `mode="overwrite"`, has
been added to the `save_table` method, allowing for the existing data in
the table to be deleted and replaced with the new data being written.
We've also added two new test cases,
`test_statement_execution_backend_save_table_overwrite_empty_table` and
`test_mock_backend_overwrite`, to verify the new functionality. It's
important to note that the method signature has been updated to include
a default value for the `mode` parameter, setting it to `append` by
default. This change does not affect the functionality and only provides
a more convenient default behavior for users of the method.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants